admin管理员组文章数量:1312736
I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, but I keep seeig this Simba JDBC exception.
I've even chunked out (offset) the query to fetch/append 5000 rows at a time, with a sleep inbetween but I still see this error:
SparkException: Job aborted due to stage failure: Task 0 in stage 2947.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2947.0 (TID 15843) (10.21.40.215 executor 20): java.sql.SQLException: [Simba][JDBC](11380) Null pointer exception. at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTDataHandler.retrieveData(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQResultSet.getData(Unknown Source) at bigquery.shaded.simba.googlebigquery.jdbcmon.SForwardResultSet.getData(Unknown Source) at bigquery.shaded.simba.googlebigquery.jdbcmon.SForwardResultSet.getString(Unknown Source) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13(JdbcUtils.scala:484) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13$adapted(JdbcUtils.scala:482) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:376) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) at .apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at .apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFac...
File <command-6291825545273755>, line 88
85 df_chunk = df_chunk.withColumn("event_date", lit(event_date))
87 # Append chunk to Bronze table
---> 88 df_chunk.write.option("mergeSchema", "true").mode("append").saveAsTable(bronze_table)
90 offset += BATCH_SIZE
本文标签:
版权声明:本文标题:pyspark - Simba JDBC Null pointer exception when querying tables via BigQuery Databricks connection - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741910120a2404391.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论