admin管理员组

文章数量:1387380

Ok I cannot wrap my head around that.

I have two dataframes, first_df and second_df, where first_df contains information about street segments, including the street name, start and end numbers of street segments, and whether the segment is for even or odd numbers. second_df contains street numbers and their corresponding addresses.

The goal is to assign the correct "Secteur Elementaire" (elementary sector) to each street number in second_df based on the information in first_df. So it needs to pass three checks:

  1. Check if streetname exists (and create a subset dataframe containing only the elemnts corresponding to this streetname)
  2. Check if streetnumber is within the range defined by first_df['Début'] and first_df['Fin']
  3. Check if it is even or odd, and assign the sector based on its whether the number is part of the even or odd sector.

The issue I'm facing is that the logic to determine the correct sector based on whether the street number is even or odd is not working as expected. The current implementation is assigning the wrong sectors in some cases.

first_df = pd.DataFrame({
    'Voie': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
    'Périmètre élémentaire': ['PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
    'Périmètre maternelle': ['AUVERGNATS-PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
    'Début': [142, 1, 73, 2],
    'Fin': [998, 71, 999, 140],
    'Partie': ['Pair', 'Impair', 'Impair', 'Pair']
})

second_df = pd.DataFrame({
    'numero': [1, 2, 6, 7, 8, 9, 10, 12],
    'nom_afnor': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
    'Secteur Elementaire': ['tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd']
})


def sort_df(first_df, second_df):
    for bano_index, row in second_df.iterrows():
        street_number = row['numero']
        street_name = row['nom_afnor']
        #print(f'first loop: {num_de_rue}')

        if street_name in first_df['Voie'].values: # check if street name exists
            reims_filtered = first_df.loc[first_df['Voie'] == street_name] # if it does create a dataframe containing only the elements matching the street name
            #print(reims_filtered)
            for reims_index, reims_matching_row in reims_filtered.iterrows(): # iterate over the rows the filtered dataframe
                #print(reims_matching_row)
                if street_number >= reims_matching_row['Début'] and street_number <= reims_matching_row['Fin']: # check if street number is between range of street segment
                    #print(f'Check range {street_number} {reims_matching_row['Périmètre élémentaire']}')
                    if street_number % 2 == 0: # check if street number is even
                        print(reims_index, street_number, reims_matching_row['Partie'])
                        if reims_matching_row['Partie'] == 'Pair': # if it is even, then check which sector should be taken to be assigned based on street segment and its corresponding odd/even sector
                            print(f'Check column {street_number} {reims_matching_row['Périmètre élémentaire']}')
                            sector_to_assign = reims_matching_row['Périmètre élémentaire']
                            second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
                            break
                    else:
                        print(f'Check odd {street_number} {reims_matching_row['Périmètre élémentaire']}')
                        sector_to_assign = reims_matching_row['Périmètre élémentaire']
                        second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
                    break

    return second_df

sort_df(first_df, second_df)

Output of running this function:

Check odd 1 AVRANCHES
1 2 Impair
1 6 Impair
Check odd 7 AVRANCHES
1 8 Impair
Check odd 9 AVRANCHES
1 10 Impair
1 12 Impair
   numero         nom_afnor Secteur Elementaire
0       1  AVENUE D EPERNAY           AVRANCHES
1       2  AVENUE D EPERNAY                 tbd
2       6  AVENUE D EPERNAY                 tbd
3       7  AVENUE D EPERNAY           AVRANCHES
4       8  AVENUE D EPERNAY                 tbd
5       9  AVENUE D EPERNAY           AVRANCHES
6      10  AVENUE D EPERNAY                 tbd
7      12  AVENUE D EPERNAY                 tbd

Clearly there is an issue with the odd or even logic, especially when trying to assign reims_matching_row['Partie'] == 'Pair'.

If someone has any leads…

Ok I cannot wrap my head around that.

I have two dataframes, first_df and second_df, where first_df contains information about street segments, including the street name, start and end numbers of street segments, and whether the segment is for even or odd numbers. second_df contains street numbers and their corresponding addresses.

The goal is to assign the correct "Secteur Elementaire" (elementary sector) to each street number in second_df based on the information in first_df. So it needs to pass three checks:

  1. Check if streetname exists (and create a subset dataframe containing only the elemnts corresponding to this streetname)
  2. Check if streetnumber is within the range defined by first_df['Début'] and first_df['Fin']
  3. Check if it is even or odd, and assign the sector based on its whether the number is part of the even or odd sector.

The issue I'm facing is that the logic to determine the correct sector based on whether the street number is even or odd is not working as expected. The current implementation is assigning the wrong sectors in some cases.

first_df = pd.DataFrame({
    'Voie': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
    'Périmètre élémentaire': ['PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
    'Périmètre maternelle': ['AUVERGNATS-PROVENCAUX', 'AVRANCHES', 'MAISON BLANCHE', 'SCULPTEURS JACQUES'],
    'Début': [142, 1, 73, 2],
    'Fin': [998, 71, 999, 140],
    'Partie': ['Pair', 'Impair', 'Impair', 'Pair']
})

second_df = pd.DataFrame({
    'numero': [1, 2, 6, 7, 8, 9, 10, 12],
    'nom_afnor': ['AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY', 'AVENUE D EPERNAY'],
    'Secteur Elementaire': ['tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd', 'tbd']
})


def sort_df(first_df, second_df):
    for bano_index, row in second_df.iterrows():
        street_number = row['numero']
        street_name = row['nom_afnor']
        #print(f'first loop: {num_de_rue}')

        if street_name in first_df['Voie'].values: # check if street name exists
            reims_filtered = first_df.loc[first_df['Voie'] == street_name] # if it does create a dataframe containing only the elements matching the street name
            #print(reims_filtered)
            for reims_index, reims_matching_row in reims_filtered.iterrows(): # iterate over the rows the filtered dataframe
                #print(reims_matching_row)
                if street_number >= reims_matching_row['Début'] and street_number <= reims_matching_row['Fin']: # check if street number is between range of street segment
                    #print(f'Check range {street_number} {reims_matching_row['Périmètre élémentaire']}')
                    if street_number % 2 == 0: # check if street number is even
                        print(reims_index, street_number, reims_matching_row['Partie'])
                        if reims_matching_row['Partie'] == 'Pair': # if it is even, then check which sector should be taken to be assigned based on street segment and its corresponding odd/even sector
                            print(f'Check column {street_number} {reims_matching_row['Périmètre élémentaire']}')
                            sector_to_assign = reims_matching_row['Périmètre élémentaire']
                            second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
                            break
                    else:
                        print(f'Check odd {street_number} {reims_matching_row['Périmètre élémentaire']}')
                        sector_to_assign = reims_matching_row['Périmètre élémentaire']
                        second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
                    break

    return second_df

sort_df(first_df, second_df)

Output of running this function:

Check odd 1 AVRANCHES
1 2 Impair
1 6 Impair
Check odd 7 AVRANCHES
1 8 Impair
Check odd 9 AVRANCHES
1 10 Impair
1 12 Impair
   numero         nom_afnor Secteur Elementaire
0       1  AVENUE D EPERNAY           AVRANCHES
1       2  AVENUE D EPERNAY                 tbd
2       6  AVENUE D EPERNAY                 tbd
3       7  AVENUE D EPERNAY           AVRANCHES
4       8  AVENUE D EPERNAY                 tbd
5       9  AVENUE D EPERNAY           AVRANCHES
6      10  AVENUE D EPERNAY                 tbd
7      12  AVENUE D EPERNAY                 tbd

Clearly there is an issue with the odd or even logic, especially when trying to assign reims_matching_row['Partie'] == 'Pair'.

If someone has any leads…

Share Improve this question asked Mar 17 at 23:59 LouisLouis 3411 gold badge2 silver badges12 bronze badges 1
  • 1 Les noms «premier dataframe» et «deuxième dataframe» are just terrible names. Please choose meaningful identifiers. Names affect how we think about a problem. You will find it easier to decompose the current problem into feasible subtasks once you've renamed that pair of dataframes so they better align with the business problem at hand. – J_H Commented Mar 18 at 16:32
Add a comment  | 

3 Answers 3

Reset to default 0

Your code is breaking too early in the clause starting here:


if street_number % 2 == 0: # check if street number is even 

if the first rows "Partie" is not "Pair", you won't assign anything, and you will hit the break below your else statement, exiting the loop.

To fix it, we can add the requirement that it matches to your conditional:

def sort_df(first_df, second_df):
    for bano_index, row in second_df.iterrows():
        street_number = row['numero']
        street_name = row['nom_afnor']
        #print(f'first loop: {num_de_rue}')

        if street_name in first_df['Voie'].values: # check if street name exists
            reims_filtered = first_df.loc[first_df['Voie'] == street_name] # if it does create a dataframe containing only the elements matching the street name
            #print(reims_filtered)
            for reims_index, reims_matching_row in reims_filtered.iterrows(): # iterate over the rows the filtered dataframe
                #print(reims_matching_row)
                if street_number >= reims_matching_row['Début'] and street_number <= reims_matching_row['Fin']: # check if street number is between range of street segment
                    #print(f'Check range {street_number} {reims_matching_row['Périmètre élémentaire']}')
                    if street_number % 2 == 0 and reims_matching_row['Partie'] == 'Pair': # check if street number is even
                        #print(reims_index, street_number, reims_matching_row['Partie'])
                        print(f'Check column {street_number} {reims_matching_row["Périmètre élémentaire"]}')
                        sector_to_assign = reims_matching_row["Périmètre élémentaire"]
                        second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
                        break
                    elif street_number % 2 != 0 and reims_matching_row['Partie'] == 'Impair':
                        print(f'Check odd {street_number} {reims_matching_row["Périmètre élémentaire"]}')
                        sector_to_assign = reims_matching_row["Périmètre élémentaire"]
                        second_df.at[bano_index, 'Secteur Elementaire'] = sector_to_assign
                        break
                    break

    return second_df

Depending on the size of the data, you could imagine a subset by street, then a join on a range to speed things up.

You can use conditional_join from pyjanitor to merge both dataframes based on your conditions.

# pip install pyjanitor
import janitor

final_df = (second_df.assign(p=(second_df['numero']%2).map({0: 'Pair', 1: 'Impair'}))
           .conditional_join(first_df, ('numero', 'Début', '>='), ('numero', 'Fin', '<='),
                       ('nom_afnor', 'Voie', '=='), ('p', 'Partie', '=='))
           .assign(**{"Secteur Elementaire":lambda d: d['Périmètre élémentaire']})
           .loc[:, second_df.columns])

End result:

 numero        nom_afnor Secteur Elementaire
      1 AVENUE D EPERNAY           AVRANCHES
      2 AVENUE D EPERNAY  SCULPTEURS JACQUES
      6 AVENUE D EPERNAY  SCULPTEURS JACQUES
      7 AVENUE D EPERNAY           AVRANCHES
      8 AVENUE D EPERNAY  SCULPTEURS JACQUES
      9 AVENUE D EPERNAY           AVRANCHES
     10 AVENUE D EPERNAY  SCULPTEURS JACQUES
     12 AVENUE D EPERNAY  SCULPTEURS JACQUES

You can use a list comprehension with bitwise filtering to get each appropriate elementary sector then convert that to a series and replace the 'Secteur Elementaire' column with the wanted values. For filtering odd and even, I used a dictionary to map 0 to 'Pair' and 1 to 'Impair'.

odd_even = {0: 'Pair', 1: 'Impair'} #Make a dictionary to map odd and evens

second_df['Secteur Elementaire'] = pd.Series([
    (filtered.iloc[0] if not filtered.empty else 'tbd')  #If the filter can't find a result    
    for _, row in second_df.iterrows()
    for filtered in [first_df.loc[
        (first_df['Voie'] == row['nom_afnor']) & #Filter bitwise 
        (row['numero'] <= first_df['Fin']) & 
        (row['numero'] >= first_df['Début']) &
        (first_df['Partie'] == odd_even[row['numero'] % 2]), #Use the dictionary
        'Périmètre élémentaire' #The column you want
    ]]
], index=second_df.index) #Need to specify the index for correct order

Since the series will be applied according to the index, you can use this principle to perform this on subsections of the dataframe if your dataframe is large.

Alternatively, you can make it a function and use apply, which might be more convenient and easier to debug.

def find_perimeter(row):
    filtered = first_df.loc[
        (first_df['Voie'] == row['nom_afnor']) &
        (row['numero'] <= first_df['Fin']) & 
        (row['numero'] >= first_df['Début']) &
        (first_df['Partie'] == odd_even[row['numero'] % 2]),
        'Périmètre élémentaire'
    ]
    return filtered.iloc[0] if not filtered.empty else 'tbd'

second_df['Secteur Elementaire'] = second_df.apply(find_perimeter, axis=1)

Both of these will fill your 'Secteur Elementaire' column with the filtered values. Be aware, if a street could fit multiple possible sectors, it will pick the sequential first in the dataframe because of filtered.iloc[0].

本文标签: