admin管理员组文章数量:1289528
I have wasted a considerable amount of time trying to make exceptAll() pyspark function, and as far as I understood it was failing (not recognizing existing on target table) due to the fact that both dataframes column order was slightly different. Therefore, I want to confirm and further understand
- Does PySpark exceptAll() function requires both dataframe to have same column order?
- List iteIsn't it intelligent enought to map same column names?
Thanks
I have wasted a considerable amount of time trying to make exceptAll() pyspark function, and as far as I understood it was failing (not recognizing existing on target table) due to the fact that both dataframes column order was slightly different. Therefore, I want to confirm and further understand
- Does PySpark exceptAll() function requires both dataframe to have same column order?
- List iteIsn't it intelligent enought to map same column names?
Thanks
Share Improve this question edited Feb 25 at 10:07 Ged 18.1k8 gold badges47 silver badges103 bronze badges asked Feb 20 at 11:04 David SánchezDavid Sánchez 6341 gold badge11 silver badges23 bronze badges 1- 2 Spark complies with SQL standard behavior to process columns positionally. – mazaneicha Commented Feb 20 at 13:33
1 Answer
Reset to default 1Yes, you are correct. The below would work.
# Simulating data, could be from file
data1 = [("abc", 1), ("def", 2), ("xyz", 26)]
cols1 = ["val1", "val2"]
data2 = [(1, "abc"), (2, "def"), (26, "xyz")]
cols2 = ["val2", "val1"]
df1 = spark.createDataFrame(data1, cols1)
df2 = spark.createDataFrame(data2, cols2)
# Get col names from both DF's, nice little feature
columns_df1 = df1.columns
columns_df2 = df2.columns
# Sorting, no renaming done, but that could be needed as well in some cases, e.g. withColumnRenamed
df1 = df1.select(sorted(columns_df1))
df2 = df2.select(sorted(columns_df2))
# Apply except
df1.exceptAll(df2).show()
#df1.show()
#df2.show()
This below would not work. You would need rename cols.
df1 = spark.createDataFrame(data1, colsA)
df2 = spark.createDataFrame(data2, cols2)
Data types that differ can be compared though.
本文标签: pysparkDoes Spark exceptAll() requires both dataframe columns to be in same orderStack Overflow
版权声明:本文标题:pyspark - Does Spark exceptAll() requires both dataframe columns to be in same order? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741441182a2378929.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论