admin管理员组文章数量:1289525
I am quite new to Pandas. I need to select/locate the records between 2 dates.
I have tried a range of methods, but cant seem to get it. I have included a cut down of sample of the CSV/Data I am working with.
Each column is a date, so all of the documentation I have found don't match this data structure
Thanks for any help.
sample csv file
I am quite new to Pandas. I need to select/locate the records between 2 dates.
I have tried a range of methods, but cant seem to get it. I have included a cut down of sample of the CSV/Data I am working with.
Each column is a date, so all of the documentation I have found don't match this data structure
Thanks for any help.
sample csv file
Share Improve this question asked Feb 20 at 12:02 JDPJDP 133 bronze badges 1- Images of data are not reproducible, please provide a minimal reproducible example as code/text. Also clearly explain what are the index/columns and what is the exact expected output. – mozway Commented Feb 20 at 12:04
2 Answers
Reset to default 1Here is the script to select the records between dates. This code should be little bit faster:
import pandas as pd
file_path = "file.xlsx" # Update with the correct file path
df = pd.read_excel(file_path)
# Please change the dates according to your need (04/01/2025 to 06/01/2025).
selected_columns = df[["Fname"] + ["Lname"] + list(df.loc[:, "04/01/2025":"06/01/2025"].columns)]
print(selected_columns)
If you don't need Fname and Lname please remove "["Fname"] + ["Lname"] + ". Just use the line below
selected_columns = df[list(df.loc[:, "04/01/2025":"06/01/2025"].columns)]
If you want to run the script preventing an error if any column is missing, please use:
try:
date_columns = list(df.loc[:, "04/01/2025":"06/01/2025"].columns)
except KeyError:
print("Error: The specified date range columns do not exist in the dataset.")
date_columns = [] # Prevents errors in the next step
selected_columns = df[["Fname"] + date_columns]
Output
Assuming you have non-date columns and date-like columns, you could convert them to date with pd.to_datetime
and errors='coerce'
. Select the non-date columns with isna
, and the wanted dates with between
, then perform boolean indexing on the columns and select them:
dates = pd.to_datetime(df.columns, errors='coerce', format='%d/%m/%Y')
m = dates.to_series().between(pd.Timestamp('2025-01-04'),
pd.Timestamp('2025-01-06'),
inclusive='both')
out = df.loc[:, dates.isna() | m.values]
Output:
Fname Lname 04/01/2025 05/01/2025 06/01/2025
0 Owen Richardson 128 114 239
1 Edward Jones 148 144 182
2 Steven Cameron 228 272 140
3 Aldus Turner 281 139 171
4 Dainton Wright 269 176 142
5 Sofia Harrison 100 103 154
6 Heather Evans 155 163 201
7 Stella Harris 126 183 157
8 Joyce Smith 251 143 229
9 Tyler Hill 299 293 218
If you just want the date-like:
df[df.columns[m]]
04/01/2025 05/01/2025 06/01/2025
0 128 114 239
1 148 144 182
2 228 272 140
3 281 139 171
4 269 176 142
5 100 103 154
6 155 163 201
7 126 183 157
8 251 143 229
9 299 293 218
本文标签: Python Pandas Loc Columns Between 2 DatesStack Overflow
版权声明:本文标题:Python Pandas Loc Columns Between 2 Dates - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741437011a2378694.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论