apache spark - Writing data to ADW through JDBC in a PySpark environment performs poorly - Stack Overflow

IT技术

更新时间：2025-01-0811

admin管理员组
文章数量:1122846

I am trying to write PySpark DataFrames to ADW (Oracle Autonomous Data Warehouse) using JDBC in a Jupyter Lab environment, but the performance is low.

dataframe.format("jdbc").mode('overwrite').option("batchsize", batchsize).option('createTableColumnTypes', create_str).option("rewriteBatchedStatements", "true").option("url", jdbc_url).option("dbtable", table).option("user", self.user).option("password", self.password).option("driver", "oracle.jdbc.OracleDriver").save()

I'm using the rewriteBatchedStatements and batchsize parameters, but the performance is still bad.

Using other tools like DBeaver, the load performance is better. Could you suggest a guide or best practices to achieve this connection?

ojdbc8 Spark 3.5.0 Oracle 19c

I am trying to write PySpark DataFrames to ADW (Oracle Autonomous Data Warehouse) using JDBC in a Jupyter Lab environment, but the performance is low.

dataframe.format("jdbc").mode('overwrite').option("batchsize", batchsize).option('createTableColumnTypes', create_str).option("rewriteBatchedStatements", "true").option("url", jdbc_url).option("dbtable", table).option("user", self.user).option("password", self.password).option("driver", "oracle.jdbc.OracleDriver").save()

I'm using the rewriteBatchedStatements and batchsize parameters, but the performance is still bad.

Using other tools like DBeaver, the load performance is better. Could you suggest a guide or best practices to achieve this connection?

ojdbc8 Spark 3.5.0 Oracle 19c

Share Improve this question asked Nov 21, 2024 at 20:54 danmo41 34 bronze badges

switch to scala-spark and use the jdbc connection in forEachPartitions – Devyl Commented Dec 1, 2024 at 17:43

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Oracle ADW performs best when using its bulk load capabilities. You can enable it by setting a specific connection property:

.option("oracle.jdbc.defaultBatchValue", "5000")

And try use defaultRowPrefetch 100 which tells the Oracle driver how many rows to fetch (default is 10).

Hope it helps.

本文标签： apache sparkWriting data to ADW through JDBC in a PySpark environment performs poorlyStack Overflow

版权声明：本文标题：apache spark - Writing data to ADW through JDBC in a PySpark environment performs poorly - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736307208a1933232.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

apache spark - Writing data to ADW through JDBC in a PySpark environment performs poorly - Stack Overflow

1 Answer 1

更多相关文章