pyspark - BigQuery-Spark Hybrid Query reader - Stack Overflow

IT技术

更新时间：2025-04-180

admin管理员组
文章数量:1410712

I have a csv in GCS, and there one huge table in BigQuery called emp_target.

Currently I read this CSV file using spark like:

df = spark.read.format("csv").option()...load()
df.createOrReplaceTempView("empTempView")

now i need to join this view (empTempView) with emp_target using a query like:

query = "select e.empid,e.empname,e.salary,e.department, t.managerID from empTempView e inner join dataset.emp_target t on e.empid=t.empid"

I tried to execute this using two methods

Method 1:

res_df  = spark.sql(query)

Method 1 did not work and gave me an error like empTempView does not exist in bigquery

Method 2:

res_df = spark.read.format("bigquery").option(..).option("dbtable",query)...

Method 2 gave me the same error

Note: I do not have option to write tempView into bigquery and do join and i can not load emp_target into spark dataframe since it is huge

How can I achieve joining above two different datasets in spark and process in dataproc?

本文标签： pysparkBigQuerySpark Hybrid Query readerStack Overflow

版权声明：本文标题：pyspark - BigQuery-Spark Hybrid Query reader - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744933816a2633060.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

pyspark - BigQuery-Spark Hybrid Query reader - Stack Overflow

更多相关文章