admin管理员组

文章数量:1291425

While reading a JSON file and converting from pandas dataframe to spark. Process stops with this error.

Py4JError: An error occurred while calling None.apache.spark.api.python.PythonFunction. Trace: py4j.Py4JException: Constructor .apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class .apache.spark.api.python.PythonAccumulatorV2]) does not exist

import pandas as pd
import json
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.getOrCreate()

with open('data_file.json', 'r') as file:
    data = json.load(file)
df = pd.json_normalize(data)
df = pd.DataFrame(df, columns=['Col1', 'Col2'])

df_aws = spark.createDataFrame(df)

Trying to use findspark

import findspark
findspark.init('path/to/spark-3.5.4-bin-hadoop3')

As well as trying to use spark parameters

import os
import sys
spark_path = r"path/to/spark-3.5.4-bin-hadoop3" # spark installed folder
os.environ['SPARK_HOME'] = spark_path
sys.path.insert(0, spark_path + "/bin")
sys.path.insert(0, spark_path + "/python/pyspark/")
sys.path.insert(0, spark_path + "/python/lib/pyspark.zip")
sys.path.insert(0, spark_path + "/python/lib/py4j-0.10.7-src.zip")

Both the approaches are not working, any assistance would be appreciated

本文标签: Py4JError An error occurred while calling NoneorgapachesparkapipythonPythonFunctionStack Overflow