admin管理员组

文章数量:1401673

I have 2 dataframes that are the results of 2 queries, returned in JSON. One of the dataframes sap_scrapped looks like this and was generated by:

sap_scrapped = pd.DataFrame(response.json()["d"]["results"]))
       Material  Scrapped
0  08-008099-00         2
1  10-000001-00         5

The other dataframe sap_warehouse looks like this and was generated by:

sap_warehouse = pd.DataFrame(response.json()["d"]["results"]))
       Material  SSP  Scrapped
0  10-000001-00    0         0
1  10-789001-00   10         7

I want to compare each value in the sap_scrapped 'Material' column to the sap_warehouse 'Material' column. If the scrapped material is not found anywhere in sap_warehouse, then I want to insert that missing material to the sap_warehouse dataframe. Using the data above, I'd want to insert 08-008099-00 into the sap_warehouse dataframe into the 'Material' column, zero in the SSP column and copy the 'Scrapped' value from the sap_scrapped dataframe (2) into 'Scrapped' in sap_warehouse.

How can I do that?

I have 2 dataframes that are the results of 2 queries, returned in JSON. One of the dataframes sap_scrapped looks like this and was generated by:

sap_scrapped = pd.DataFrame(response.json()["d"]["results"]))
       Material  Scrapped
0  08-008099-00         2
1  10-000001-00         5

The other dataframe sap_warehouse looks like this and was generated by:

sap_warehouse = pd.DataFrame(response.json()["d"]["results"]))
       Material  SSP  Scrapped
0  10-000001-00    0         0
1  10-789001-00   10         7

I want to compare each value in the sap_scrapped 'Material' column to the sap_warehouse 'Material' column. If the scrapped material is not found anywhere in sap_warehouse, then I want to insert that missing material to the sap_warehouse dataframe. Using the data above, I'd want to insert 08-008099-00 into the sap_warehouse dataframe into the 'Material' column, zero in the SSP column and copy the 'Scrapped' value from the sap_scrapped dataframe (2) into 'Scrapped' in sap_warehouse.

How can I do that?

Share Improve this question edited Mar 23 at 18:20 wjandrea 33.2k10 gold badges69 silver badges98 bronze badges asked Mar 23 at 17:25 JoeJoe 12 bronze badges 3
  • What should happen if the scrapped material is found in sap_warehouse? Please edit to clarify and add desired output. See How to make good reproducible pandas examples. (See also minimal reproducible example.) – wjandrea Commented Mar 23 at 18:21
  • Another minor reason to make a minimal reproducible example is that you only have one response here, which is confusing on first glance, but obviously that would be a different object depending on the context. – wjandrea Commented Mar 23 at 18:35
  • Please provide enough code so others can better understand or reproduce the problem. – Community Bot Commented Mar 24 at 10:31
Add a comment  | 

2 Answers 2

Reset to default 1

You can use data-fingerprint tool: https://pypi./project/data-fingerprint/#description to check differencess between those two dataframes, and than based on the difference insert the ones that are missing, here is an example:

import pandas as pd
import polars as pl

from data_fingerprint.srcparator import get_data_report
from data_fingerprint.src.models import DataReport
from data_fingerprint.src.utils import get_dataframe

if __name__ == "__main__":
    sap_scrapped: pl.DataFrame = pl.DataFrame(
        {"Material": ["08-008099-0", "10-000001-00"], "Scrapped": [2, 5]}
    )
    sap_warehouse: pl.DataFrame = pl.DataFrame(
        {
            "Material": ["10-000001-00", "10-789001-00"],
            "SSP": [0, 10],
            "Scrapped": [0, 7],
        }
    )
    
    # get difference information
    report: DataReport = get_data_report(
        sap_scrapped,
        sap_warehouse.drop("Scrapped"),
        "sap_scrapped",
        "sap_warehouse",
        grouping_columns=["Material"],
    )
    
    # transform difference to dataframe
    difference: pl.DataFrame = get_dataframe(report)
    
    # filter out difference from "sap_scrapped"
    to_insert_materials: pl.DataFrame = difference.filter(
        pl.col("source") == "sap_scrapped"
    )

    # gather information
    to_insert: pl.DataFrame = sap_scrapped.filter(
        pl.col("Material").is_in(to_insert_materials["Material"])
    ).with_columns(pl.lit(0).alias("SSP"))
    print(to_insert)

You can also use pandas.DataFrame instead of polars.DataFrame

Here's one approach:

out = (
    sap_warehouse.set_index('Material')
    bine_first(
        sap_scrapped.set_index('Material').assign(SSP=0)
        )
    .reset_index()
    )

Output:

       Material  SSP  Scrapped
0  08-008099-00    0         2
1  10-000001-00    0         0
2  10-789001-00   10         7

Explanation

  • Use .set_index to make 'Material' the index for both dfs and apply bine_first.
  • Add 'SSP' to sap_scrapped as '0' for missing values via .assign.
  • Reset the index with .reset_index.

Data used

import pandas as pd

data = {'Material': {0: '08-008099-00', 1: '10-000001-00'}, 
        'Scrapped': {0: 2, 1: 5}}
sap_scrapped = pd.DataFrame(data)

data2 = {'Material': {0: '10-000001-00', 1: '10-789001-00'}, 
         'SSP': {0: 0, 1: 10}, 'Scrapped': {0: 0, 1: 7}}
sap_warehouse = pd.DataFrame(data2)

本文标签: pythonHow to compare dataframes and add to dataframe if neccessaryStack Overflow