admin管理员组文章数量:1387380
Ok I cannot wrap my head around that.
I have two dataframes, first_df
and second_df
, where first_df
contains information about street segments, including the street name, start and end numbers of street segments, and whether the segment is for even or odd numbers. second_df
contains street numbers and their corresponding addresses.
The goal is to assign the correct "Secteur Elementaire" (elementary sector) to each street number in second_df
based on the information in first_df
. So it needs to pass three checks:
- Check if streetname exists (and create a subset dataframe containing only the elemnts corresponding to this streetname)
- Check if streetnumber is within the range defined by
first_df['Début']
andfirst_df['Fin']
- Check if it is even or odd, and assign the sector based on its whether the number is part of the even or odd sector.
The issue I'm facing is that the logic to determine the correct sector based on whether the street number is even or odd is not working as expected. The current implementation is assigning the wrong sectors in some cases.
first_df = pd.DataFrame({
'Voie': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
'Périmètre élémentaire': ['PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
'Périmètre maternelle': ['AUVERGNATS-PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
'Début': [142, 1, 73, 2],
'Fin': [998, 71, 999, 140],
'Partie': ['Pair', 'Impair', 'Impair', 'Pair']
})
second_df = pd.DataFrame({
'numero': [1, 2, 6, 7, 8, 9, 10, 12],
'nom_afnor': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
'Secteur Elementaire': ['tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd']
})
def sort_df(first_df, second_df):
for bano_index, row in second_df.iterrows():
street_number = row['numero']
street_name = row['nom_afnor']
#print(f'first loop: {num_de_rue}')
if street_name in first_df['Voie'].values: # check if street name exists
reims_filtered = first_df.loc[first_df['Voie'] == street_name] # if it does create a dataframe containing only the elements matching the street name
#print(reims_filtered)
for reims_index, reims_matching_row in reims_filtered.iterrows(): # iterate over the rows the filtered dataframe
#print(reims_matching_row)
if street_number >= reims_matching_row['Début'] and street_number <= reims_matching_row['Fin']: # check if street number is between range of street segment
#print(f'Check range {street_number} {reims_matching_row['Périmètre élémentaire']}')
if street_number % 2 == 0: # check if street number is even
print(reims_index, street_number, reims_matching_row['Partie'])
if reims_matching_row['Partie'] == 'Pair': # if it is even, then check which sector should be taken to be assigned based on street segment and its corresponding odd/even sector
print(f'Check column {street_number} {reims_matching_row['Périmètre élémentaire']}')
sector_to_assign = reims_matching_row['Périmètre élémentaire']
second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
break
else:
print(f'Check odd {street_number} {reims_matching_row['Périmètre élémentaire']}')
sector_to_assign = reims_matching_row['Périmètre élémentaire']
second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
break
return second_df
sort_df(first_df, second_df)
Output of running this function:
Check odd 1 AVRANCHES
1 2 Impair
1 6 Impair
Check odd 7 AVRANCHES
1 8 Impair
Check odd 9 AVRANCHES
1 10 Impair
1 12 Impair
numero nom_afnor Secteur Elementaire
0 1 AVENUE D EPERNAY AVRANCHES
1 2 AVENUE D EPERNAY tbd
2 6 AVENUE D EPERNAY tbd
3 7 AVENUE D EPERNAY AVRANCHES
4 8 AVENUE D EPERNAY tbd
5 9 AVENUE D EPERNAY AVRANCHES
6 10 AVENUE D EPERNAY tbd
7 12 AVENUE D EPERNAY tbd
Clearly there is an issue with the odd or even logic, especially when trying to assign reims_matching_row['Partie'] == 'Pair'
.
If someone has any leads…
Ok I cannot wrap my head around that.
I have two dataframes, first_df
and second_df
, where first_df
contains information about street segments, including the street name, start and end numbers of street segments, and whether the segment is for even or odd numbers. second_df
contains street numbers and their corresponding addresses.
The goal is to assign the correct "Secteur Elementaire" (elementary sector) to each street number in second_df
based on the information in first_df
. So it needs to pass three checks:
- Check if streetname exists (and create a subset dataframe containing only the elemnts corresponding to this streetname)
- Check if streetnumber is within the range defined by
first_df['Début']
andfirst_df['Fin']
- Check if it is even or odd, and assign the sector based on its whether the number is part of the even or odd sector.
The issue I'm facing is that the logic to determine the correct sector based on whether the street number is even or odd is not working as expected. The current implementation is assigning the wrong sectors in some cases.
first_df = pd.DataFrame({
'Voie': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
'Périmètre élémentaire': ['PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
'Périmètre maternelle': ['AUVERGNATS-PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
'Début': [142, 1, 73, 2],
'Fin': [998, 71, 999, 140],
'Partie': ['Pair', 'Impair', 'Impair', 'Pair']
})
second_df = pd.DataFrame({
'numero': [1, 2, 6, 7, 8, 9, 10, 12],
'nom_afnor': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
'Secteur Elementaire': ['tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd']
})
def sort_df(first_df, second_df):
for bano_index, row in second_df.iterrows():
street_number = row['numero']
street_name = row['nom_afnor']
#print(f'first loop: {num_de_rue}')
if street_name in first_df['Voie'].values: # check if street name exists
reims_filtered = first_df.loc[first_df['Voie'] == street_name] # if it does create a dataframe containing only the elements matching the street name
#print(reims_filtered)
for reims_index, reims_matching_row in reims_filtered.iterrows(): # iterate over the rows the filtered dataframe
#print(reims_matching_row)
if street_number >= reims_matching_row['Début'] and street_number <= reims_matching_row['Fin']: # check if street number is between range of street segment
#print(f'Check range {street_number} {reims_matching_row['Périmètre élémentaire']}')
if street_number % 2 == 0: # check if street number is even
print(reims_index, street_number, reims_matching_row['Partie'])
if reims_matching_row['Partie'] == 'Pair': # if it is even, then check which sector should be taken to be assigned based on street segment and its corresponding odd/even sector
print(f'Check column {street_number} {reims_matching_row['Périmètre élémentaire']}')
sector_to_assign = reims_matching_row['Périmètre élémentaire']
second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
break
else:
print(f'Check odd {street_number} {reims_matching_row['Périmètre élémentaire']}')
sector_to_assign = reims_matching_row['Périmètre élémentaire']
second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
break
return second_df
sort_df(first_df, second_df)
Output of running this function:
Check odd 1 AVRANCHES
1 2 Impair
1 6 Impair
Check odd 7 AVRANCHES
1 8 Impair
Check odd 9 AVRANCHES
1 10 Impair
1 12 Impair
numero nom_afnor Secteur Elementaire
0 1 AVENUE D EPERNAY AVRANCHES
1 2 AVENUE D EPERNAY tbd
2 6 AVENUE D EPERNAY tbd
3 7 AVENUE D EPERNAY AVRANCHES
4 8 AVENUE D EPERNAY tbd
5 9 AVENUE D EPERNAY AVRANCHES
6 10 AVENUE D EPERNAY tbd
7 12 AVENUE D EPERNAY tbd
Clearly there is an issue with the odd or even logic, especially when trying to assign reims_matching_row['Partie'] == 'Pair'
.
If someone has any leads…
Share Improve this question asked Mar 17 at 23:59 LouisLouis 3411 gold badge2 silver badges12 bronze badges 1- 1 Les noms «premier dataframe» et «deuxième dataframe» are just terrible names. Please choose meaningful identifiers. Names affect how we think about a problem. You will find it easier to decompose the current problem into feasible subtasks once you've renamed that pair of dataframes so they better align with the business problem at hand. – J_H Commented Mar 18 at 16:32
3 Answers
Reset to default 0Your code is breaking too early in the clause starting here:
if street_number % 2 == 0: # check if street number is even
if the first rows "Partie" is not "Pair", you won't assign anything, and you will hit the break below your else statement, exiting the loop.
To fix it, we can add the requirement that it matches to your conditional:
def sort_df(first_df, second_df):
for bano_index, row in second_df.iterrows():
street_number = row['numero']
street_name = row['nom_afnor']
#print(f'first loop: {num_de_rue}')
if street_name in first_df['Voie'].values: # check if street name exists
reims_filtered = first_df.loc[first_df['Voie'] == street_name] # if it does create a dataframe containing only the elements matching the street name
#print(reims_filtered)
for reims_index, reims_matching_row in reims_filtered.iterrows(): # iterate over the rows the filtered dataframe
#print(reims_matching_row)
if street_number >= reims_matching_row['Début'] and street_number <= reims_matching_row['Fin']: # check if street number is between range of street segment
#print(f'Check range {street_number} {reims_matching_row['Périmètre élémentaire']}')
if street_number % 2 == 0 and reims_matching_row['Partie'] == 'Pair': # check if street number is even
#print(reims_index, street_number, reims_matching_row['Partie'])
print(f'Check column {street_number} {reims_matching_row["Périmètre élémentaire"]}')
sector_to_assign = reims_matching_row["Périmètre élémentaire"]
second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
break
elif street_number % 2 != 0 and reims_matching_row['Partie'] == 'Impair':
print(f'Check odd {street_number} {reims_matching_row["Périmètre élémentaire"]}')
sector_to_assign = reims_matching_row["Périmètre élémentaire"]
second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
break
break
return second_df
Depending on the size of the data, you could imagine a subset by street, then a join on a range to speed things up.
You can use conditional_join from pyjanitor to merge both dataframes based on your conditions.
# pip install pyjanitor
import janitor
final_df = (second_df.assign(p=(second_df['numero']%2).map({0: 'Pair', 1: 'Impair'}))
.conditional_join(first_df, ('numero', 'Début', '>='), ('numero', 'Fin', '<='),
('nom_afnor', 'Voie', '=='), ('p', 'Partie', '=='))
.assign(**{"Secteur Elementaire":lambda d: d['Périmètre élémentaire']})
.loc[:, second_df.columns])
End result:
numero nom_afnor Secteur Elementaire
1 AVENUE D EPERNAY AVRANCHES
2 AVENUE D EPERNAY SCULPTEURS JACQUES
6 AVENUE D EPERNAY SCULPTEURS JACQUES
7 AVENUE D EPERNAY AVRANCHES
8 AVENUE D EPERNAY SCULPTEURS JACQUES
9 AVENUE D EPERNAY AVRANCHES
10 AVENUE D EPERNAY SCULPTEURS JACQUES
12 AVENUE D EPERNAY SCULPTEURS JACQUES
You can use a list comprehension with bitwise filtering to get each appropriate elementary sector then convert that to a series and replace the 'Secteur Elementaire' column with the wanted values. For filtering odd and even, I used a dictionary to map 0 to 'Pair' and 1 to 'Impair'.
odd_even = {0: 'Pair', 1: 'Impair'} #Make a dictionary to map odd and evens
second_df['Secteur Elementaire'] = pd.Series([
(filtered.iloc[0] if not filtered.empty else 'tbd') #If the filter can't find a result
for _, row in second_df.iterrows()
for filtered in [first_df.loc[
(first_df['Voie'] == row['nom_afnor']) & #Filter bitwise
(row['numero'] <= first_df['Fin']) &
(row['numero'] >= first_df['Début']) &
(first_df['Partie'] == odd_even[row['numero'] % 2]), #Use the dictionary
'Périmètre élémentaire' #The column you want
]]
], index=second_df.index) #Need to specify the index for correct order
Since the series will be applied according to the index, you can use this principle to perform this on subsections of the dataframe if your dataframe is large.
Alternatively, you can make it a function and use apply, which might be more convenient and easier to debug.
def find_perimeter(row):
filtered = first_df.loc[
(first_df['Voie'] == row['nom_afnor']) &
(row['numero'] <= first_df['Fin']) &
(row['numero'] >= first_df['Début']) &
(first_df['Partie'] == odd_even[row['numero'] % 2]),
'Périmètre élémentaire'
]
return filtered.iloc[0] if not filtered.empty else 'tbd'
second_df['Secteur Elementaire'] = second_df.apply(find_perimeter, axis=1)
Both of these will fill your 'Secteur Elementaire' column with the filtered values. Be aware, if a street could fit multiple possible sectors, it will pick the sequential first in the dataframe because of filtered.iloc[0]
.
本文标签:
版权声明:本文标题:python - Correctly Assign Street Sectors Based on EvenOdd Street Numbers and Street Segment Ranges - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744530388a2610999.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论