admin管理员组

文章数量:1123589

Sample Input value in struct column -

{"quantity_unit":"Sqm","u_quantity":"6", "capacity":null}

Required Output -

{"quantity_unit":"Sqm","u_quantity":"6"}

I tried with below code but it is converting the type to string. I want to retain the type to struct itself

df1 = df_cleaned.select(to_json(df_cleaned.new_col).alias("jsoncol"))
display(df1)

Sample Input value in struct column -

{"quantity_unit":"Sqm","u_quantity":"6", "capacity":null}

Required Output -

{"quantity_unit":"Sqm","u_quantity":"6"}

I tried with below code but it is converting the type to string. I want to retain the type to struct itself

df1 = df_cleaned.select(to_json(df_cleaned.new_col).alias("jsoncol"))
display(df1)
Share Improve this question asked 21 hours ago mprmpr 2292 silver badges7 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

What you're asking for isn't directly possible due to the way struct types work in Spark. The struct data type is designed to maintain a consistent schema across all rows, which means all defined fields, including those with null values, are retained to ensure the structure remains uniform.

However, if your goal is to dynamically remove fields with null values, you might consider using a map type instead. A map allows for flexible key-value pairs, where you can filter out entries with null values easily, and it can be a map of string, string if that fits your use case.

This way, you can exclude the null values while still keeping the remaining data intact.

Here is how you can do it, assuming your dataframe is df and the column you want to "filter" is data:

df.withColumn(
    "data",
    F.col("data").cast("map<string,string>"),
).withColumn(
    "filtered_data",
    F.expr("map_filter(data, (k, v) -> v is not null)"),
)

本文标签: jsonPyspark dataframe Remove keys with null values in struct columnStack Overflow