admin管理员组

文章数量:1289565

I have a json object per row in an Azure dataflow and want to append all values to an array and then flatten it, so that each element of the array is a value rather than all values for that specific row.

My input data looks like this:

Column
{"Energy to Utility_kWh_15m": "ef9033a5-ca4c-44eb-9f20-5c8c0d4ca7d6", "Output Energy_kWh_15m": "849871d1-b5f5-4ae8-86ad-5030ce16cce5", "Plant Availability_%_15m": "5db1004a-bcdc-4973-9816-124262893d21"
{"Energy to Utility_kWh_15m": "97046418-371d-41d3-a213-5e9715847a34", "Output Energy_kWh_15m": "6dc86c06-1a5c-11e9-9358-42010afa015a", "Plant Availability_%_15m": "6dcac67c-1a5c-11e9-9358-42010afa015a"}
...

I have a json object per row in an Azure dataflow and want to append all values to an array and then flatten it, so that each element of the array is a value rather than all values for that specific row.

My input data looks like this:

Column
{"Energy to Utility_kWh_15m": "ef9033a5-ca4c-44eb-9f20-5c8c0d4ca7d6", "Output Energy_kWh_15m": "849871d1-b5f5-4ae8-86ad-5030ce16cce5", "Plant Availability_%_15m": "5db1004a-bcdc-4973-9816-124262893d21"
{"Energy to Utility_kWh_15m": "97046418-371d-41d3-a213-5e9715847a34", "Output Energy_kWh_15m": "6dc86c06-1a5c-11e9-9358-42010afa015a", "Plant Availability_%_15m": "6dcac67c-1a5c-11e9-9358-42010afa015a"}
...

and I want my final output to look like:

New Column
"ef9033a5-ca4c-44eb-9f20-5c8c0d4ca7d6"
"849871d1-b5f5-4ae8-86ad-5030ce16cce5"
"5db1004a-bcdc-4973-9816-124262893d21"
"97046418-371d-41d3-a213-5e9715847a34"
"6dc86c06-1a5c-11e9-9358-42010afa015a
"6dcac67c-1a5c-11e9-9358-42010afa015a"
...

so that I can use the data in a ForEach pipeline activity and loop through each id.

I have the below solution that provides my expected output, where each select activity following the flatten selects a specific column (one of the key-value pairing). This is not a good solution because as my keys expand so too will the select activities required. I would like this to be dynamic, based on the keys in the json.

Share Improve this question edited Feb 21 at 14:18 Rakesh Govindula 11.5k2 gold badges4 silver badges17 bronze badges Recognized by Microsoft Azure Collective asked Feb 21 at 10:46 YorkYork 717 bronze badges 6
  • Do you want to get a new kind of solution? or can I provide the next steps after this previous so answer? – Rakesh Govindula Commented Feb 21 at 11:38
  • Ideally I'd like a new solution where I don't have to add select activities for each key/column. I will be reusing the structure of this solution for a separate dataset that contains many more key value pairings, therefore the current solution would be a problem as it would require probably another 12 select activities on top – York Commented Feb 21 at 11:44
  • In this case, there are same key names in every row, but you don't know how many are they are going to be right? – Rakesh Govindula Commented Feb 21 at 11:46
  • Yes exactly, the values are only what are different but the keys are consistent. I just need a way to not have to add select items for every possible key, because for another dataset there will be at least 20 – York Commented Feb 21 at 12:09
  • As you don't have the control over the structure or keys on the JSON string, I will try string operations to achieve your requirement. I will update here. – Rakesh Govindula Commented Feb 21 at 12:37
 |  Show 1 more comment

1 Answer 1

Reset to default 1

You can try the below approach, but this will only work in this case where there are not nested structures in your JSON strings and the values should not contain the special character Double quote (").

First take a derived column transformation after your source. Here, Create a new column sub_arr with below expression.

slice(map(split(Column,'": "'),split(#item,'"')[1]),2)

This will first split the JSON string on '": "' and then for each string, it will again split the sub record string on '"' and takes first item. It means, it will create the array of values for each JSON string row as shown below.

Next, to combine these arrays of each row, take an Aggregate transformation and create the required res_arr column in the aggregate section with below expression. Here, no need to take any column in Group By section.

flatten(collect(sub_arr))

Now, it will give the expected array as shown below.

本文标签: Combine values from a json in an Azure dataflow into one large arrayStack Overflow