pyspark - Does Spark exceptAll() requires both dataframe columns to be in same order? - Stack Overflow

IT技术

更新时间：2025-03-081

admin管理员组
文章数量:1289528

I have wasted a considerable amount of time trying to make exceptAll() pyspark function, and as far as I understood it was failing (not recognizing existing on target table) due to the fact that both dataframes column order was slightly different. Therefore, I want to confirm and further understand

Does PySpark exceptAll() function requires both dataframe to have same column order?
List iteIsn't it intelligent enought to map same column names?

Thanks

I have wasted a considerable amount of time trying to make exceptAll() pyspark function, and as far as I understood it was failing (not recognizing existing on target table) due to the fact that both dataframes column order was slightly different. Therefore, I want to confirm and further understand

Does PySpark exceptAll() function requires both dataframe to have same column order?
List iteIsn't it intelligent enought to map same column names?

Thanks

Share Improve this question edited Feb 25 at 10:07 Ged 18.1k8 gold badges47 silver badges103 bronze badges asked Feb 20 at 11:04 David Sánchez 6341 gold badge11 silver badges23 bronze badges

2 Spark complies with SQL standard behavior to process columns positionally. – mazaneicha Commented Feb 20 at 13:33

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Yes, you are correct. The below would work.

# Simulating data, could be from file
data1 = [("abc", 1), ("def", 2), ("xyz", 26)]
cols1 = ["val1", "val2"]
data2 = [(1, "abc"), (2, "def"), (26, "xyz")]
cols2 = ["val2", "val1"]
df1 = spark.createDataFrame(data1, cols1)
df2 = spark.createDataFrame(data2, cols2)

# Get col names from both DF's, nice little feature 
columns_df1 = df1.columns
columns_df2 = df2.columns

# Sorting, no renaming done, but that could be needed as well in some cases, e.g. withColumnRenamed
df1 = df1.select(sorted(columns_df1))
df2 = df2.select(sorted(columns_df2))

# Apply except
df1.exceptAll(df2).show()
#df1.show()
#df2.show()

This below would not work. You would need rename cols.

df1 = spark.createDataFrame(data1, colsA)
df2 = spark.createDataFrame(data2, cols2)

Data types that differ can be compared though.

本文标签： pysparkDoes Spark exceptAll() requires both dataframe columns to be in same orderStack Overflow

版权声明：本文标题：pyspark - Does Spark exceptAll() requires both dataframe columns to be in same order? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741441182a2378929.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

pyspark - Does Spark exceptAll() requires both dataframe columns to be in same order? - Stack Overflow

1 Answer 1

更多相关文章