admin管理员组文章数量:1126317
I am trying to read AWS S3 bucket with pyspark. The bucket requires requester to pay to read.
However, it doesn't seem to work although the similar credentials on aws-cli works and the reason that I believe spark.hadoop.fs.s3a.requester-pays.enabled
config is the reason is because if I remove the parameter --request-payer requester
on aws-cli I get the exactly same error.
Below is my code for pyspark configuration
spark = SparkSession.builder \
.appName("MainnetBlocksStreamingJob") \
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.0,com.amazonaws:aws-java-sdk-bundle:1.11.375") \
.config("spark.hadoop.fs.s3a.access.key", S3_ACCESS_KEY) \
.config("spark.hadoop.fs.s3a.secret.key", S3_SECRET_KEY) \
.config("spark.hadoop.fs.s3a.endpoint", "s3.amazonaws") \
.config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
.config("spark.hadoop.fs.s3a.path.style.access", "true") \
.config("spark.hadoop.fs.s3a.requester-pays.enabled", "true") \
.config("spark.hadoop.fs.s3a.requester.pays.enabled", "true") \
.config('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider')\
.getOrCreate()
And I running pyspark with the command
spark-submit \--packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.2.0,com.amazonaws:aws-java-sdk-bundle:1.11.375 \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" \
--conf spark.hadoop.fs.s3a.requester-pays.enabled=true \
dataproc_jobs/streaming.py
Thank you.
本文标签: pythonpySpark Hadoop AWS s3 requesterpaysenabled config doesn39t workStack Overflow
版权声明:本文标题:python - pySpark Hadoop AWS s3 requester-pays.enabled config doesn't work - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736641216a1945997.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论