admin管理员组文章数量:1410712
I have a csv in GCS, and there one huge table in BigQuery called emp_target
.
Currently I read this CSV file using spark like:
df = spark.read.format("csv").option()...load()
df.createOrReplaceTempView("empTempView")
now i need to join this view (empTempView) with emp_target
using a query like:
query = "select e.empid,e.empname,e.salary,e.department, t.managerID from empTempView e inner join dataset.emp_target t on e.empid=t.empid"
I tried to execute this using two methods
Method 1:
res_df = spark.sql(query)
Method 1 did not work and gave me an error like empTempView does not exist in bigquery
Method 2:
res_df = spark.read.format("bigquery").option(..).option("dbtable",query)...
Method 2 gave me the same error
Note: I do not have option to write tempView into bigquery and do join and i can not load emp_target into spark dataframe since it is huge
How can I achieve joining above two different datasets in spark and process in dataproc?
本文标签: pysparkBigQuerySpark Hybrid Query readerStack Overflow
版权声明:本文标题:pyspark - BigQuery-Spark Hybrid Query reader - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744933816a2633060.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论