admin管理员组

文章数量:1400201

I have dataframes that show the codes belonging to each ID.

import pandas as pd

data_group = {
    'id': ['0111','0123'],
    'code': [['1', '2', '3'],['1','2']]
}
df_group = pd.DataFrame(data_group)

and i have dataframes with ids and code and dates this is sample of dataframe

data = {
    'codice': ['1', '2', '3', '1','1','1'],
    'id': ['0111', '0111', '0111', '0111','0123','0123'],
    'data1': ['2025-02-03 02:16:00', '2025-02-03 02:18:00', '2025-02-03 02:17:00', '2025-02-03 12:02:00','2025-02-03 12:02:00','2025-02-03 12:02:00'],
    'data2': ['2025-02-03 02:44:00', '2025-02-03 02:44:00', '2025-02-03 02:39:00', '2025-02-03 12:05:00','2025-02-03 12:06:00','2025-02-03 12:04:00']
}
df = pd.DataFrame(data) 

I want to identify the overlapping date ranges within the entire group of code IDs and return the common date ranges.(ex: for '0111' common range of date between codes 1,2,3 not just 2,3 or 1,2) the result that i want is :

result = {
    'id' :['0111'],
    'data1': ['2025-02-03 02:18:00'],
    'data2': ['2025-02-03 02:39:00']

I have dataframes that show the codes belonging to each ID.

import pandas as pd

data_group = {
    'id': ['0111','0123'],
    'code': [['1', '2', '3'],['1','2']]
}
df_group = pd.DataFrame(data_group)

and i have dataframes with ids and code and dates this is sample of dataframe

data = {
    'codice': ['1', '2', '3', '1','1','1'],
    'id': ['0111', '0111', '0111', '0111','0123','0123'],
    'data1': ['2025-02-03 02:16:00', '2025-02-03 02:18:00', '2025-02-03 02:17:00', '2025-02-03 12:02:00','2025-02-03 12:02:00','2025-02-03 12:02:00'],
    'data2': ['2025-02-03 02:44:00', '2025-02-03 02:44:00', '2025-02-03 02:39:00', '2025-02-03 12:05:00','2025-02-03 12:06:00','2025-02-03 12:04:00']
}
df = pd.DataFrame(data) 

I want to identify the overlapping date ranges within the entire group of code IDs and return the common date ranges.(ex: for '0111' common range of date between codes 1,2,3 not just 2,3 or 1,2) the result that i want is :

result = {
    'id' :['0111'],
    'data1': ['2025-02-03 02:18:00'],
    'data2': ['2025-02-03 02:39:00']
Share Improve this question edited Mar 24 at 15:04 mozway 264k13 gold badges50 silver badges98 bronze badges asked Mar 24 at 14:38 so.nso.n 1266 bronze badges 5
  • How do you define a "common range of date"? Can you explain your example? Also your data constructor is invalid. – mozway Commented Mar 24 at 14:44
  • for ex: in id 0111 between code 1,2,3 we have overlapping date in date1 and date2 that is '2025-02-03 02:18:00', '2025-02-03 02:43:00' – so.n Commented Mar 24 at 14:51
  • That's not really explaining, more rephrasing. I would identify 2025-02-03 12:02:00 as the beginning of the range – mozway Commented Mar 24 at 14:51
  • 1 I'll post my answer below and you can comment if this is not what you expect – mozway Commented Mar 24 at 14:52
  • Actually I got it now, but this is not really well explained. – mozway Commented Mar 24 at 15:03
Add a comment  | 

1 Answer 1

Reset to default 0

The logic is not fully clear to me, but assuming your intervals are overlapping, that date1 is the start and date2 the end, and you want the max/min, you can perform a groupby.agg, then filter the rows based on df_group:

# convert to datetime
df[['data1', 'data2']] = df[['data1', 'data2']].apply(pd.to_datetime)

# sort the values, identify overlapping dates
# form a new group, get the minimal interval
(df
 .sort_values(by=['id', 'data1', 'data2'])
 .assign(n=lambda x: x['data1'].gt(x['data2'].shift()).groupby(x['id']).cumsum())
 .groupby(['id', 'n'], as_index=False)
 .agg({'data1': 'max', 'data2': 'min', 'codice': set})
 # filter the groups that are complete based on "df_group"
 .loc[lambda x: x['id'].map(df_group.set_index('id')['code'].apply(set))
                       .eq(x['codice'])
     ]
)

Output:

     id  n               data1               data2     codice
0  0111  0 2025-02-03 02:18:00 2025-02-03 02:43:00  {1, 3, 2}

For a dictionary, add:

out.set_index('id')[['data1', 'data2']].astype(str).T.to_dict()

Output:

{'0111': {'data1': '2025-02-03 02:18:00', 'data2': '2025-02-03 02:43:00'}}

本文标签: pythonidentify the overlapping date ranges pandasStack Overflow