admin管理员组

文章数量:1410712

I have a csv in GCS, and there one huge table in BigQuery called emp_target.

Currently I read this CSV file using spark like:

df = spark.read.format("csv").option()...load()
df.createOrReplaceTempView("empTempView")

now i need to join this view (empTempView) with emp_target using a query like:

query = "select e.empid,e.empname,e.salary,e.department, t.managerID from empTempView e inner join dataset.emp_target t on e.empid=t.empid"

I tried to execute this using two methods

Method 1:

res_df  = spark.sql(query) 

Method 1 did not work and gave me an error like empTempView does not exist in bigquery

Method 2:

res_df = spark.read.format("bigquery").option(..).option("dbtable",query)...

Method 2 gave me the same error

Note: I do not have option to write tempView into bigquery and do join and i can not load emp_target into spark dataframe since it is huge

How can I achieve joining above two different datasets in spark and process in dataproc?

本文标签: pysparkBigQuerySpark Hybrid Query readerStack Overflow