admin管理员组文章数量:1291425
While reading a JSON file and converting from pandas dataframe to spark. Process stops with this error.
Py4JError: An error occurred while calling None.apache.spark.api.python.PythonFunction. Trace: py4j.Py4JException: Constructor .apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class .apache.spark.api.python.PythonAccumulatorV2]) does not exist
import pandas as pd
import json
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.getOrCreate()
with open('data_file.json', 'r') as file:
data = json.load(file)
df = pd.json_normalize(data)
df = pd.DataFrame(df, columns=['Col1', 'Col2'])
df_aws = spark.createDataFrame(df)
Trying to use findspark
import findspark
findspark.init('path/to/spark-3.5.4-bin-hadoop3')
As well as trying to use spark parameters
import os
import sys
spark_path = r"path/to/spark-3.5.4-bin-hadoop3" # spark installed folder
os.environ['SPARK_HOME'] = spark_path
sys.path.insert(0, spark_path + "/bin")
sys.path.insert(0, spark_path + "/python/pyspark/")
sys.path.insert(0, spark_path + "/python/lib/pyspark.zip")
sys.path.insert(0, spark_path + "/python/lib/py4j-0.10.7-src.zip")
Both the approaches are not working, any assistance would be appreciated
本文标签: Py4JError An error occurred while calling NoneorgapachesparkapipythonPythonFunctionStack Overflow
版权声明:本文标题:Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonFunction - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741534560a2383954.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论