Spark Scala - read and write to parquet file and retain the original schema - Stack Overflow

IT技术

更新时间：2025-01-087

admin管理员组
文章数量:1122846

In the below scala code, I am reading a parquet file, amending value of a column and writing the new dataframe into a new parquet file:

var df = spark.read.parquet(sourcePath)
val newDf = df.withColumn("my_num_field", lit(11.10))
newDf.write.parquet(targetPath)

Running the above code is generating a new parquet file with the new value 11.10 for my_num_field. However, schema type is being changed for this field and few other fields where the original schema type was:

"type" : [ "null", {
      "type" : "fixed",
      "name" : "my_num_field",
      "size" : 16,
      "logicalType" : "decimal",
      "precision" : 14,
      "scale" : 3
    }

And the new data type is now:

"type" : "double"

This is producing below error when I load the parquet on HDFS and run a select query:

incompatible Parquet schema for column 'my_db.my_table.my_num_field'. Column type: DECIMAL(14,3), Parquet schema: required double my_num_field

How can I retain the original schema?

I have tried few things suggested on SOF already and still produces the same outcome:

Added cast .cast(DecimalType(14, 3) after lit(11.10).
Set overwrite schema flag during both read and write: .option("overwriteSchema","false")
Set merge schema flag: .option("mergeSchema","false") and also with and without above overwriteSchema option.

本文标签： Spark Scalaread and write to parquet file and retain the original schemaStack Overflow

版权声明：本文标题：Spark Scala - read and write to parquet file and retain the original schema - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736308650a1933739.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Spark Scala - read and write to parquet file and retain the original schema - Stack Overflow

更多相关文章