admin管理员组文章数量:1296248
tl;dr: How do I use SparkSession.newSession with changes to the SQL config?
I'm using PySpark within AWS Glue, creating a Glue 5 notebook.
I'd like to have two different SparkSession
s, with different SQL configs (two different warehouses). Everything is iceberg.
I can easily set up a session that works fine doing something like this:
warehouse_path = "s3://some_s3_bucket/path"
spark = SparkSession.builder \
.config("spark.sql.warehouse.dir", warehouse_path) \
.config("spark.sql.catalog.glue_catalog.warehouse", warehouse_path) \
.config("spark.sql.catalog.glue_catalog", ".apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.glue_catalog.catalog-impl", ".apache.iceberg.aws.glue.GlueCatalog") \
.config("spark.sql.catalog.glue_catalog.io-impl", ".apache.iceberg.aws.s3.S3FileIO") \
.config("spark.sql.extensions", ".apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.config("spark.sql.parquetpression.codec", "gzip") \
.getOrCreate()
So, to have two different sessions, with different warehouse paths, I attempt to do something like this:
spark = SparkSession.builder \
.config("spark.sql.catalog.glue_catalog", ".apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.glue_catalog.catalog-impl", ".apache.iceberg.aws.glue.GlueCatalog") \
.config("spark.sql.catalog.glue_catalog.io-impl", ".apache.iceberg.aws.s3.S3FileIO") \
.config("spark.sql.extensions", ".apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.config("spark.sql.parquetpression.codec", "gzip") \
.getOrCreate()
warehouse_path_1 = "s3://s3_bucket_1/path"
spark_session_1 = spark.newSession()
spark_session_1.conf.set("spark.sql.warehouse.dir", warehouse_path_1)
spark_session_1.conf.set("spark.sql.catalog.glue_catalog.warehouse", warehouse_path_1)
warehouse_path_2 = "s3://s3_bucket_2/path"
spark_session_2 = spark.newSession()
spark_session_2.conf.set("spark.sql.warehouse.dir", warehouse_path_2)
spark_session_2.conf.set("spark.sql.catalog.glue_catalog.warehouse", warehouse_path_2)
(I also tried the same thing with all of the sql confs being set on the child sessions, not just the changed ones, with the same results)
I end up with this error (or a similar one for whichever sql conf I try to change first:
AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir
On the one hand, I understand that the the SQLConf is static, but if you look at the docs for newSession
it says (emphasis mine):
Returns a new SparkSession as new session, that has separate SQLConf, registered temporary views and UDFs, but shared SparkContext and table cache.
So, if it has a "separate SQLConf", how can I actually set it up with different SQL options?
本文标签: apache sparkSparkSessionnewSession with distinct SQLConfStack Overflow
版权声明:本文标题:apache spark - SparkSession.newSession with distinct SQLConf - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741631459a2389392.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论