admin管理员组文章数量:1122832
Each week our inventory reporting software outputs 15+ different inventory reports (one for each vendor respectively) in .xlsx format into one folder.
All the excel sheets are in an identical format but require a fair bit of manipulation prior to being “usuable” to the end user. After trying to learn python over the last week, and piecing together various bits of code I've managed to come up with a VERY rough script to perform the processing required.
The script works as desired but as it stands currently I have to manually input the individual excel sheet file path, and change the output file name, within the script and then run the script for each corresponding excel file. Doing this for 15+ files each week renders the script redundant as would take less time to just reformat within excel manually.
After more extensive forum reading, and trying to plug my script into others code solutions (.html) I'm at an impasse. I'm struggling come up with a function to batch process all excel files in one folder, and ignore all other files. Ideally the manipulated files would replace the original excel files in the same folder but if that's not possible then it's not a deal breaker.
To throw another curveball into the mix I have been writing my script and deploying my code on my mac (using a copy of the .xlsx file which I sent myself from work, and a mac os filepath) to test my script on, but I hope to deploy this script on the Windows system at work eventually.
I've included only my original, currently functioning script below.
Thanks all
#import pandas library
import pandas as pd
#import numpy
import numpy as np
#importing our excel to dataframe
df = pd.read_excel('/Users/christiane/Downloads/AGLC CJ Test.xls.xlsx')
#renaming our headers for each column
df = df.set_axis(['vendor code', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'current inventory', 'incoming inventory', 'm', 'Sales: Previous 4 weeks', 'o'], axis=1)
#convert 'vendor code' column data to string data
df['vendor code'] = df['vendor code'].astype(str)
#delete everything that doesn't start with a number from the bottom of the table
df = df[~df['vendor code'].str.contains("[a-zA-Z]").fillna(False)]
#convert 'vendor code' column data to interger data
df['vendor code'] = df['vendor code'].astype(int)
#sort values in column #1 from smallet to largest
df.sort_values(by='vendor code', ascending=True, inplace=True)
#add new column at the end input formula and populate downwards (=K1+L1)
df['expectedinventory'] = df.loc[:,['current inventory', 'incoming inventory']].sum(axis=1)
#move 'expected inventory' column from the end position to correct column position
name_col = df.pop('expectedinventory')
df.insert(12, 'expectedinventory', name_col)
#delete columns 'm' and 'o'
df = df.drop(['m', 'o'], axis=1)
#run formula for 4 weeks of back orders and populate column “sold in last 4 weeks” with the results
df['Sales Advice'] = df['expectedinventory'] - (( df['Sales: Previous 4 weeks']*1) + (0.1*( df['Sales: Previous 4 weeks'])*1))
#Don't know what this does but it works!
pd.set_option('future.no_silent_downcasting', True)
#change NaN values to Zero
df['Sales Advice'] = df['Sales Advice'].fillna(0)
inventory_data = pd.DataFrame(df)
cols = ['Sales Advice']
inventory_data.loc[:, cols] = inventory_data[cols].astype('float64').round()
#delete unnecessary columns
df = df.drop(['b', 'c', 'e', 'f','g','h', 'i', 'j'], axis=1)
#adding a right-most column with NaN values
df = df.reindex(columns=df.columns.tolist() + ['Ordering Now'])
#output excel file
df.to_excel('filteredaglctest.xlsx', index = False)
Each week our inventory reporting software outputs 15+ different inventory reports (one for each vendor respectively) in .xlsx format into one folder.
All the excel sheets are in an identical format but require a fair bit of manipulation prior to being “usuable” to the end user. After trying to learn python over the last week, and piecing together various bits of code I've managed to come up with a VERY rough script to perform the processing required.
The script works as desired but as it stands currently I have to manually input the individual excel sheet file path, and change the output file name, within the script and then run the script for each corresponding excel file. Doing this for 15+ files each week renders the script redundant as would take less time to just reformat within excel manually.
After more extensive forum reading, and trying to plug my script into others code solutions (https://python-forum.io/thread-10841.html) I'm at an impasse. I'm struggling come up with a function to batch process all excel files in one folder, and ignore all other files. Ideally the manipulated files would replace the original excel files in the same folder but if that's not possible then it's not a deal breaker.
To throw another curveball into the mix I have been writing my script and deploying my code on my mac (using a copy of the .xlsx file which I sent myself from work, and a mac os filepath) to test my script on, but I hope to deploy this script on the Windows system at work eventually.
I've included only my original, currently functioning script below.
Thanks all
#import pandas library
import pandas as pd
#import numpy
import numpy as np
#importing our excel to dataframe
df = pd.read_excel('/Users/christiane/Downloads/AGLC CJ Test.xls.xlsx')
#renaming our headers for each column
df = df.set_axis(['vendor code', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'current inventory', 'incoming inventory', 'm', 'Sales: Previous 4 weeks', 'o'], axis=1)
#convert 'vendor code' column data to string data
df['vendor code'] = df['vendor code'].astype(str)
#delete everything that doesn't start with a number from the bottom of the table
df = df[~df['vendor code'].str.contains("[a-zA-Z]").fillna(False)]
#convert 'vendor code' column data to interger data
df['vendor code'] = df['vendor code'].astype(int)
#sort values in column #1 from smallet to largest
df.sort_values(by='vendor code', ascending=True, inplace=True)
#add new column at the end input formula and populate downwards (=K1+L1)
df['expectedinventory'] = df.loc[:,['current inventory', 'incoming inventory']].sum(axis=1)
#move 'expected inventory' column from the end position to correct column position
name_col = df.pop('expectedinventory')
df.insert(12, 'expectedinventory', name_col)
#delete columns 'm' and 'o'
df = df.drop(['m', 'o'], axis=1)
#run formula for 4 weeks of back orders and populate column “sold in last 4 weeks” with the results
df['Sales Advice'] = df['expectedinventory'] - (( df['Sales: Previous 4 weeks']*1) + (0.1*( df['Sales: Previous 4 weeks'])*1))
#Don't know what this does but it works!
pd.set_option('future.no_silent_downcasting', True)
#change NaN values to Zero
df['Sales Advice'] = df['Sales Advice'].fillna(0)
inventory_data = pd.DataFrame(df)
cols = ['Sales Advice']
inventory_data.loc[:, cols] = inventory_data[cols].astype('float64').round()
#delete unnecessary columns
df = df.drop(['b', 'c', 'e', 'f','g','h', 'i', 'j'], axis=1)
#adding a right-most column with NaN values
df = df.reindex(columns=df.columns.tolist() + ['Ordering Now'])
#output excel file
df.to_excel('filteredaglctest.xlsx', index = False)
Share
Improve this question
edited Nov 22, 2024 at 4:51
Marcin Orlowski
75.6k11 gold badges127 silver badges152 bronze badges
asked Nov 22, 2024 at 3:59
Christian EChristian E
12 bronze badges
3
|
1 Answer
Reset to default 0from pathlib import Path
# APIs get imported here
# Main folder where excel files are found
FOLDER = "/home/user/Documents/excel_files" #Change this to your folder path!
EXT = ".xlsx"
def updateExcel(file):
print(f"---> Processing {file} <----")
#Insert your code here:
def main():
#Specift the path in which the files are found
folder_path = Path(FOLDER)
#Get a list of files with a specific extension
files_ext = [files_ext for file in folder_path.iterdir() if file.is_file() and file.sufix = EXT]
#Process the files
for file in files_ext:
updateExcel(FOLDER+"/"+file)
if __name__ == '__main__':
main()
本文标签:
版权声明:本文标题:list - Batch processing multiple excel files in folder via python script and outputting into folder, while ignoring other file t 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736305710a1932688.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
AGLC CJ Test.xls.xlsx
is one of the files you need to process and you would want to loop your script to process 14 other files just like it, possibly with some variation regarding the format of input or output? What varies between runs? (other than the input file name and the output file name)? – Grismar Commented Nov 22, 2024 at 4:59