admin管理员组

文章数量:1355609

I have a nested json, but I can't understand how to work with them.

{
    "return": {
        "status_processing": "3",
        "status": "OK",
        "order": {
            "id": "872102042",
            "number": "123831",
            "date_order": "dd/mm/yyyy",
            "items": [
                {
                    "item": {
                        "id_product": "684451795",
                        "code": "VPOR",
                        "description": "Product 1",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                },
                {
                    "item": {
                        "id_product": "684451091",
                        "code": "VSAP",
                        "description": "Product 2",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                }
            ]
        }
    }
}

I searched on stackoverflow questions, and try some resolutions that people passed, but don't work for me.

Here an sample that I used to accessing the data from json:

df = pd.json_normalize(
    order_list,
    record_path=["return", "order", "itens"],
    meta=[
        ["return", "order", "id"],
        ["return", "order", "date_order"],
        ["return", "order", "number"],
    ],
)

But don't work, they duplicating the data when I send to dataframe.

Anyone can help me?

EDIT

Here an example that I used:

Convert nested JSON to pandas DataFrame

And what I expected:

I have a nested json, but I can't understand how to work with them.

{
    "return": {
        "status_processing": "3",
        "status": "OK",
        "order": {
            "id": "872102042",
            "number": "123831",
            "date_order": "dd/mm/yyyy",
            "items": [
                {
                    "item": {
                        "id_product": "684451795",
                        "code": "VPOR",
                        "description": "Product 1",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                },
                {
                    "item": {
                        "id_product": "684451091",
                        "code": "VSAP",
                        "description": "Product 2",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                }
            ]
        }
    }
}

I searched on stackoverflow questions, and try some resolutions that people passed, but don't work for me.

Here an sample that I used to accessing the data from json:

df = pd.json_normalize(
    order_list,
    record_path=["return", "order", "itens"],
    meta=[
        ["return", "order", "id"],
        ["return", "order", "date_order"],
        ["return", "order", "number"],
    ],
)

But don't work, they duplicating the data when I send to dataframe.

Anyone can help me?

EDIT

Here an example that I used:

Convert nested JSON to pandas DataFrame

And what I expected:

Share Improve this question edited Mar 31 at 16:53 Blaztix 1,3001 gold badge21 silver badges29 bronze badges asked Mar 30 at 14:04 Cesar AugustoCesar Augusto 256 bronze badges 7
  • if you flatten it then it may have to repeat some data. What other solutions did you try? You could add links in question (not in comments). What result do you expect? You could show it in question. It could explain what you really need. – furas Commented Mar 30 at 14:09
  • maybe it would be simpler to write normal code instead of using json_normalize – furas Commented Mar 30 at 14:10
  • normal code you saying create a for loop ? – Cesar Augusto Commented Mar 30 at 14:21
  • first you should show expected result. If you want every item in new row then it may need to use for-loop or expand instead normalize – furas Commented Mar 30 at 14:22
  • What I want is every item in new row. And thank you, I'll try using expand – Cesar Augusto Commented Mar 30 at 14:27
 |  Show 2 more comments

2 Answers 2

Reset to default 1

You're code is fine. You are getting the data, perhaps you just wanted to specify which columns to keep (or maybe rename)?

import json
import pandas as pd



data = '''   { "return": {
        "status_processing": "3",
        "status": "OK",
        "order": {
            "id": "872102042",
            "number": "123831",
            "date_order": "dd/mm/yyyy",
             "itens": [
                {
                    "item": {
                        "id_product": "684451795",
                        "code": "VPOR",
                        "description": "Product 1",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                },
                {
                    "item": {
                        "id_product": "684451091",
                        "code": "VSAP",
                        "description": "Product 2",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                }
            ]
        }
    }
}'''

order_list = json.loads(data)


df = pd.json_normalize(order_list, 
                       record_path= ["return", "order", "itens"],
                       meta=[["return", "order", "id"], ["return", "order", "number"], ["return", "order", "date_order"]])



df = df[['return.order.id', 'return.order.number', 'return.order.date_order', 'item.id_product']]

Output:

print(df)
  return.order.id return.order.number return.order.date_order item.description
0       872102042              123831              dd/mm/yyyy        Product 1
1       872102042              123831              dd/mm/yyyy        Product 2

I don't know what exactly you expect in output but if you want every item in new row then you could use normal code with for-loop for this.

order_list = {
    "return": {
        "status_processing": "3",
        "status": "OK",
        "order": {
            "id": "872102042",
            "number": "123831",
            "date_order": "dd/mm/yyyy",
             "itens": [
                {
                    "item": {
                        "id_product": "684451795",
                        "code": "VPOR",
                        "description": "Product 1",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                },
                {
                    "item": {
                        "id_product": "684451091",
                        "code": "VSAP",
                        "description": "Product 2",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                }
            ]
        }
    }
}

import pandas as pd

data = []

order = order_list['return']['order']

for iten in order['itens']:
    for key, val in iten.items():
        row = {
            #'key': key, 
            'id': order['id'], 
            'date_order': order['date_order'], 
            'number': order['number'], 
            'id_product': val['id_product'],
            #'code': val['code'],
            #'description': val['description'],
            #'quantity': val['quantity'],
            #'value': val['value'],
        }
        data.append(row)

df = pd.DataFrame(data)
print(df)

Result:

          id  date_order  number id_product
0  872102042  dd/mm/yyyy  123831  684451795
1  872102042  dd/mm/yyyy  123831  684451091

If you need other information in rows then you should show it in question.

本文标签: pythonHow to extract nested json using jsonnormalizeStack Overflow