python - Aws glue job is very slow while writing to s3 using pyspark write - Stack Overflow

IT技术

更新时间：2025-01-115

admin管理员组
文章数量:1123687

I have a Glue job which has the below configuration and writes file to s3 using spark.write and the write process is taking time to write 544 * 7.5mb file. Using coalesce(16) generates 16 2.5 gb files and it didn't help much.

Glue Config:

worker type: G1 X
max number of workers: 10
glue version 5.0

This job basically selects data from athena partitioned table and finally writes data into a s3 file. The s3 file write is taking lot of time.

 select_query = (

    f"SELECT * FROM table1 "
    f"WHERE year='{year}' AND month='{month}' AND day='{day}' AND hour='{hour}' AND col1 ='abc' "
    f"AND col2='123' AND col3 in ('ABC12','CDE23','DEF34','GHI23', "
 
    ) AND col4='NEW'  "
    f"AND key IN ('val1', 'val2')"
                )

data_df = spark.sql(select_query )
data_df.write.mode("append").parquet(athena_output_location)

本文标签： pythonAws glue job is very slow while writing to s3 using pyspark writeStack Overflow

版权声明：本文标题：python - Aws glue job is very slow while writing to s3 using pyspark write - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736592043a1945090.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - Aws glue job is very slow while writing to s3 using pyspark write - Stack Overflow

更多相关文章

python - Aws glue job is very slow while writing to s3 using pyspark write - Stack Overflow

发表评论

推荐文章

posts - Fix existing media names with accents

php - Removing website URL in comments causes misalignment of submit button and tickbox

php - setcookie() issues on running with HTTPS on WordPress

r - Rstudio: binary list - check probability of previous value changed (or remained the same) - Stack Overflow

scaladoc - Where is the documentation for scala.language.postfixOps? - Stack Overflow

热门文章

postgresql - How would you ignore a trigger within another trigger with django-pgtrigger - Stack Overflow

WordPress Language Settings will not work for BBPress

wp blog header.php - Fatal error: Uncaught Error: Call to undefined function wp()

urls - All Wordpress Website Images Broken

Nestjs GraphQL access Resolver method metadata - Stack Overflow

jupyter - Using julia with multiple people - Stack Overflow

PHP - Woocommerce 3.2 Add variable shipping insurance with multiple "else if " conditions

c# - Environment Variables in Docker Compose Not Overriding appsettings.json in .NET Core Application - Stack Overflow

How to get the variable name and output it in the template - Stack Overflow

sql server - Where can I get the msodbcsql18.dll PDB files from? - Stack Overflow

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

salesforce - How to get allMetadata retrieval using SF VSCode.. insufficient access rights on entity: GenAiPromptTemplate - Stac

How Typescript infers generics from a single object argument - Stack Overflow

kivy - Can't change the color depend on the respond from the server in Python - Stack Overflow

tabulator - how to insert tickCross using javascript - Stack Overflow

"Bounds" option missing under "Format Axis" for histogram in Excel - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

编程频道|软件玩家 - 软件改变生活！

python - Aws glue job is very slow while writing to s3 using pyspark write - Stack Overflow

更多相关文章

python - Aws glue job is very slow while writing to s3 using pyspark write - Stack Overflow

发表评论

推荐文章

posts - Fix existing media names with accents

php - Removing website URL in comments causes misalignment of submit button and tickbox

php - setcookie() issues on running with HTTPS on WordPress

r - Rstudio: binary list - check probability of previous value changed (or remained the same) - Stack Overflow

scaladoc - Where is the documentation for scala.language.postfixOps? - Stack Overflow

热门文章

postgresql - How would you ignore a trigger within another trigger with django-pgtrigger - Stack Overflow

WordPress Language Settings will not work for BBPress

wp blog header.php - Fatal error: Uncaught Error: Call to undefined function wp()

urls - All Wordpress Website Images Broken

Nestjs GraphQL access Resolver method metadata - Stack Overflow

jupyter - Using julia with multiple people - Stack Overflow

PHP - Woocommerce 3.2 Add variable shipping insurance with multiple &quot;else if &quot; conditions

c# - Environment Variables in Docker Compose Not Overriding appsettings.json in .NET Core Application - Stack Overflow

How to get the variable name and output it in the template - Stack Overflow

sql server - Where can I get the msodbcsql18.dll PDB files from? - Stack Overflow

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

salesforce - How to get allMetadata retrieval using SF VSCode.. insufficient access rights on entity: GenAiPromptTemplate - Stac

How Typescript infers generics from a single object argument - Stack Overflow

kivy - Can&#39;t change the color depend on the respond from the server in Python - Stack Overflow

tabulator - how to insert tickCross using javascript - Stack Overflow

&quot;Bounds&quot; option missing under &quot;Format Axis&quot; for histogram in Excel - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

PHP - Woocommerce 3.2 Add variable shipping insurance with multiple "else if " conditions

kivy - Can't change the color depend on the respond from the server in Python - Stack Overflow

"Bounds" option missing under "Format Axis" for histogram in Excel - Stack Overflow