admin管理员组

文章数量:1400409

I'm working on a script that processes an Excel file containing product information. The data includes both product codes (which are 5-digit numeric values) and quantities, but these values are sometimes in the same cells. I need to distinguish between product codes and quantities correctly. The issue is complicated because sometimes the quantity is also a 5-digit number, and product codes may appear in cells where quantities would be.

What I'm Trying to Do:

I have an Excel file with multiple sheets. Each sheet contains product data, and the product codes and quantities are in different columns but often appear in the same cells for some rows. I want to extract the product codes and quantities correctly. Product codes are 5-digit numbers, while quantities are numeric and can also sometimes be 5-digit numbers (e.g., 1.5, 100, or 12345).

The issue arises because I’m using a function to identify 5-digit product codes, but in some cases, this function mistakenly identifies quantities as product codes, especially when the quantity is also a 5-digit number.

The Issue:

The problem is that product codes are sometimes mixed with quantities in the same cell, and some quantities are also 5-digit numbers. My current approach is checking for 5-digit numbers and assuming those are product codes, but this leads to mistakes when the quantity is also a 5-digit number.

What I've Tried:

Here's the script I’ve written to extract product data:

import pandas as pd

file_path = "C:\\Users\\user\\Downloads\\afps1.xlsx"  # Update with your file path
xls = pd.ExcelFile(file_path)

# Extract company info
df_info = xls.parse("Co. Info", header=None)
company_info = {
    "Company Name": df_info.iloc[13, 2] if len(df_info) > 13 else None,
    "Mailing Address": df_info.iloc[14, 2] if len(df_info) > 14 else None,
    "Plant Name": df_info.iloc[15, 2] if len(df_info) > 15 else None,
    "Plant Address": df_info.iloc[16, 2] if len(df_info) > 16 else None
}

# Sheets to process
seafood_sheets = ["Wild Salmon", "Wild Fish & Shellfish", "CulturedSalmon&Fish", "Cultured Shellfish"]

data_rows = []

# Function to check if a value is a product code (5-digit numeric)
def is_product_code(value):
    if pd.isna(value):
        return False  # Return False if value is NaN
    return isinstance(value, (int, float)) and len(str(int(value))) == 5

for sheet in seafood_sheets:
    df = xls.parse(sheet, header=None)
    current_species_left = None
    current_species_right = None

    for idx, row in df.iterrows():
        left_col = str(row[0]).strip() if not pd.isna(row[0]) else ""
        right_col = str(row[5]).strip() if len(row) > 5 and not pd.isna(row[5]) else ""

        # Detect species section headers
        if left_col.isupper() and " - " in left_col:
            current_species_left = left_col
            continue
        if right_col.isupper() and " - " in right_col:
            current_species_right = right_col
            continue

        # Process left table
        if current_species_left and left_col:
            product_code = row[2] if not pd.isna(row[2]) and is_product_code(row[2]) else None
            quantity = row[3] if not is_product_code(row[3]) and not pd.isna(row[3]) else None
            value = row[4] if not pd.isna(row[4]) else None

            if quantity or value:
                data_rows.append({
                    "Species": current_species_left,
                    "Product Code": product_code,
                    "Product": left_col,
                    "Quantity": quantity,
                    "Value": value,
                    "Company Name": company_info["Company Name"],
                    "Mailing Address": company_info["Mailing Address"],
                    "Plant Name": company_info["Plant Name"],
                    "Plant Address": company_info["Plant Address"],
                    "Source Sheet": sheet
                })

        # Process right table
        if current_species_right and right_col:
            product_code = row[7] if len(row) > 7 and not pd.isna(row[7]) and is_product_code(row[7]) else None
            quantity = row[8] if len(row) > 8 and not is_product_code(row[8]) and not pd.isna(row[8]) else None
            value = row[9] if len(row) > 9 and not pd.isna(row[9]) else None

            if quantity or value:
                data_rows.append({
                    "Species": current_species_right,
                    "Product Code": product_code,
                    "Product": right_col,
                    "Quantity": quantity,
                    "Value": value,
                    "Company Name": company_info["Company Name"],
                    "Mailing Address": company_info["Mailing Address"],
                    "Plant Name": company_info["Plant Name"],
                    "Plant Address": company_info["Plant Address"],
                    "Source Sheet": sheet
                })

# Convert to DataFrame and save
final_df = pd.DataFrame(data_rows)
output_file = "processed_seafood_data.xlsx"
final_df.to_excel(output_file, index=False)

What I Need Help With:

How can I more accurately distinguish between product codes and quantities when they are both numeric and sometimes both appear as 5-digit numbers?

Is there a better method or logic I can use to identify these values correctly?

Any help would be greatly appreciated! Thank you in advance.

See image of excel doc

本文标签: excelHow to Distinguish Between Product Codes and Quantities in Mixed Cells in PythonStack Overflow