admin管理员组

文章数量:1399898

I have two Dataframes that look like this:

df = pd.DataFrame({'PERSONALNUMMER': {4756: '0209740',4820: '0234212',4855: '0251297',4750: '0209326',4992: '4000404'},
 'MANDANT': {4756: 'OM', 4820: 'OM', 4855: 'OM', 4750: 'OM', 4992: 'OM'},
 'Fachabteilung': {4756: 'HA2300',4820: 'HA2300',4855: 'HA2300',4750: 'HA2300',4992: 'HA2300'},
 'FACHEXPERTISE': {4756: 'AQ10',4820: 'AQ10',4855: 'AQ10',4750: 'AQ10',4992: 'AQ10'},
 'Leistungsgruppe': {4756: pd.NA,4820: 'Endoprothetik Knie',4855: 'Allgemeine Chirurgie',4750: 'Wirbelsaeuleneingriffe',4992: pd.NA}})


MAP = pd.DataFrame({'MANDANT': {238: 'OM', 239: 'OM', 240: 'OM', 241: 'OM'},
 'Fachabteilung': {238: 'HA2300', 239: 'HA2300', 240: 'HA2300', 241: 'HA2300'},
 'FACHEXPERTISE': {238: 'AQ10', 239: 'AQ10', 240: 'AQ10', 241: 'AQ10'},
 'Leistungsgruppe': {238: 'Allgemeine Chirurgie',
  239: 'Endoprothetik Huefte',240: 'Endoprothetik Knie',241: 'Revision Huefte'},
 'VK': {238: 3,239: 14,240: 28,241: 22}})

I want to do the following: I have empty entries in my df Dataframe in column Leistungsgruppe and I need to pick random elements out of my MAP Dataframe based on conditions.

So, I want to iterate over df and pick element(s)

The conditions are like: the MANDANT, Fachabteilung, and FACHEXPERTISE need to be the same, and if there are more than 3 entries for that element in my MAP Dataframe, I need to narrow it down to 3 elements.

Right now I have no clue how to do this because my Python knowledge when it comes to iterating on single rows is almost non-existent.

I tried to think of a way to do this with iterrows but I can not come up with a quick solution right now.

I have two Dataframes that look like this:

df = pd.DataFrame({'PERSONALNUMMER': {4756: '0209740',4820: '0234212',4855: '0251297',4750: '0209326',4992: '4000404'},
 'MANDANT': {4756: 'OM', 4820: 'OM', 4855: 'OM', 4750: 'OM', 4992: 'OM'},
 'Fachabteilung': {4756: 'HA2300',4820: 'HA2300',4855: 'HA2300',4750: 'HA2300',4992: 'HA2300'},
 'FACHEXPERTISE': {4756: 'AQ10',4820: 'AQ10',4855: 'AQ10',4750: 'AQ10',4992: 'AQ10'},
 'Leistungsgruppe': {4756: pd.NA,4820: 'Endoprothetik Knie',4855: 'Allgemeine Chirurgie',4750: 'Wirbelsaeuleneingriffe',4992: pd.NA}})


MAP = pd.DataFrame({'MANDANT': {238: 'OM', 239: 'OM', 240: 'OM', 241: 'OM'},
 'Fachabteilung': {238: 'HA2300', 239: 'HA2300', 240: 'HA2300', 241: 'HA2300'},
 'FACHEXPERTISE': {238: 'AQ10', 239: 'AQ10', 240: 'AQ10', 241: 'AQ10'},
 'Leistungsgruppe': {238: 'Allgemeine Chirurgie',
  239: 'Endoprothetik Huefte',240: 'Endoprothetik Knie',241: 'Revision Huefte'},
 'VK': {238: 3,239: 14,240: 28,241: 22}})

I want to do the following: I have empty entries in my df Dataframe in column Leistungsgruppe and I need to pick random elements out of my MAP Dataframe based on conditions.

So, I want to iterate over df and pick element(s)

The conditions are like: the MANDANT, Fachabteilung, and FACHEXPERTISE need to be the same, and if there are more than 3 entries for that element in my MAP Dataframe, I need to narrow it down to 3 elements.

Right now I have no clue how to do this because my Python knowledge when it comes to iterating on single rows is almost non-existent.

I tried to think of a way to do this with iterrows but I can not come up with a quick solution right now.

Share Improve this question edited Mar 25 at 17:44 President James K. Polk 42.1k29 gold badges109 silver badges145 bronze badges asked Mar 25 at 7:58 MikeMike 474 bronze badges 3
  • You should provide a example of the expected output, this will make your question easier to understand and more useful to future readers ;) – mozway Commented Mar 25 at 8:09
  • Especially the part about the "more than 3 entries", which is currently left for use to assume the exact logic. – mozway Commented Mar 25 at 8:09
  • Yes Mike. Please update your question and your expected output. Make your requirement more clearer. I am unable to understand your requirement – Soudipta Dutta Commented Mar 25 at 9:13
Add a comment  | 

1 Answer 1

Reset to default 0

First select up to 3 values per group with groupby.sample, then merge in order of the rows (with a secondary key created with groupby.cumcount) and fillna:

# columns to use a group
group = ['MANDANT', 'Fachabteilung', 'FACHEXPERTISE']
# column to fill
col = 'Leistungsgruppe'

# sampling 3 rows per group
sample = MAP.groupby(group)[group+[col]].sample(3)

# compute a key to enumerate the NA values
key_df = df.groupby(group)[col].transform(lambda x: x.isna().cumsum().sub(1))

# merging in order
# set original index (merge doesn't keep the index)
fill = pd.merge(df[group], sample,
         left_on=group+[key_df],
         right_on=group+[sample.groupby(group).cumcount()],
         how='left'
        )[col].set_axis(df.index)

# fill the missing values
df[col] = df[col].fillna(fill)

If you can have more values to fill in df than 3, then you could use a modulo to generate the merging key:

map_size = df[group].merge(MAP.groupby(group, as_index=False).size())['size'].values
key_df = np.random.randint(3, size=len(df)) % map_size

Another option with a custom function. First aggregate MAP to keep 3 values, then sample with numpy.random.choice in a groupby operation from df:

# get samples per group
samples = MAP.groupby(group)[col].agg(lambda x: x.sample(3))

# fill with a custom function
def fill_resample(x, pool=samples):
    return x.fillna(pd.Series(np.random.choice(pool[x.name],
                                               size=len(x),
                                               replace=True),
                              index=x.index))

df[col] = df.groupby(group)[col] .transform(fill_resample)

Example output:

     PERSONALNUMMER MANDANT Fachabteilung FACHEXPERTISE         Leistungsgruppe
4756        0209740      OM        HA2300          AQ10    Allgemeine Chirurgie
4820        0234212      OM        HA2300          AQ10      Endoprothetik Knie
4855        0251297      OM        HA2300          AQ10    Allgemeine Chirurgie
4750        0209326      OM        HA2300          AQ10  Wirbelsaeuleneingriffe
4992        4000404      OM        HA2300          AQ10    Endoprothetik Huefte

本文标签: pythonpick n different random samples from subgroupStack Overflow