apache spark - SparkSession.newSession with distinct SQLConf - Stack Overflow

IT技术

更新时间：2025-03-110

admin管理员组
文章数量:1296248

tl;dr: How do I use SparkSession.newSession with changes to the SQL config?

I'm using PySpark within AWS Glue, creating a Glue 5 notebook.

I'd like to have two different SparkSessions, with different SQL configs (two different warehouses). Everything is iceberg.

I can easily set up a session that works fine doing something like this:

warehouse_path = "s3://some_s3_bucket/path"
spark = SparkSession.builder \
    .config("spark.sql.warehouse.dir", warehouse_path) \
    .config("spark.sql.catalog.glue_catalog.warehouse", warehouse_path) \
    .config("spark.sql.catalog.glue_catalog", ".apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.glue_catalog.catalog-impl", ".apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.catalog.glue_catalog.io-impl", ".apache.iceberg.aws.s3.S3FileIO") \
    .config("spark.sql.extensions", ".apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.parquetpression.codec", "gzip") \
    .getOrCreate()

So, to have two different sessions, with different warehouse paths, I attempt to do something like this:

spark = SparkSession.builder \
    .config("spark.sql.catalog.glue_catalog", ".apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.glue_catalog.catalog-impl", ".apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.catalog.glue_catalog.io-impl", ".apache.iceberg.aws.s3.S3FileIO") \
    .config("spark.sql.extensions", ".apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.parquetpression.codec", "gzip") \
    .getOrCreate()

warehouse_path_1 = "s3://s3_bucket_1/path"
spark_session_1 = spark.newSession()
spark_session_1.conf.set("spark.sql.warehouse.dir", warehouse_path_1)
spark_session_1.conf.set("spark.sql.catalog.glue_catalog.warehouse", warehouse_path_1)

warehouse_path_2 = "s3://s3_bucket_2/path"
spark_session_2 = spark.newSession()
spark_session_2.conf.set("spark.sql.warehouse.dir", warehouse_path_2)
spark_session_2.conf.set("spark.sql.catalog.glue_catalog.warehouse", warehouse_path_2)

(I also tried the same thing with all of the sql confs being set on the child sessions, not just the changed ones, with the same results)

I end up with this error (or a similar one for whichever sql conf I try to change first:

AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir

On the one hand, I understand that the the SQLConf is static, but if you look at the docs for newSession it says (emphasis mine):

Returns a new SparkSession as new session, that has separate SQLConf, registered temporary views and UDFs, but shared SparkContext and table cache.

So, if it has a "separate SQLConf", how can I actually set it up with different SQL options?

本文标签： apache sparkSparkSessionnewSession with distinct SQLConfStack Overflow

版权声明：本文标题：apache spark - SparkSession.newSession with distinct SQLConf - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741631459a2389392.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

apache spark - SparkSession.newSession with distinct SQLConf - Stack Overflow

更多相关文章