admin管理员组文章数量:1399898
I have two Dataframes that look like this:
df = pd.DataFrame({'PERSONALNUMMER': {4756: '0209740',4820: '0234212',4855: '0251297',4750: '0209326',4992: '4000404'},
'MANDANT': {4756: 'OM', 4820: 'OM', 4855: 'OM', 4750: 'OM', 4992: 'OM'},
'Fachabteilung': {4756: 'HA2300',4820: 'HA2300',4855: 'HA2300',4750: 'HA2300',4992: 'HA2300'},
'FACHEXPERTISE': {4756: 'AQ10',4820: 'AQ10',4855: 'AQ10',4750: 'AQ10',4992: 'AQ10'},
'Leistungsgruppe': {4756: pd.NA,4820: 'Endoprothetik Knie',4855: 'Allgemeine Chirurgie',4750: 'Wirbelsaeuleneingriffe',4992: pd.NA}})
MAP = pd.DataFrame({'MANDANT': {238: 'OM', 239: 'OM', 240: 'OM', 241: 'OM'},
'Fachabteilung': {238: 'HA2300', 239: 'HA2300', 240: 'HA2300', 241: 'HA2300'},
'FACHEXPERTISE': {238: 'AQ10', 239: 'AQ10', 240: 'AQ10', 241: 'AQ10'},
'Leistungsgruppe': {238: 'Allgemeine Chirurgie',
239: 'Endoprothetik Huefte',240: 'Endoprothetik Knie',241: 'Revision Huefte'},
'VK': {238: 3,239: 14,240: 28,241: 22}})
I want to do the following:
I have empty entries in my df Dataframe in column Leistungsgruppe
and I need to pick random elements out of my MAP Dataframe based on conditions.
So, I want to iterate over df
and pick element(s)
The conditions are like: the MANDANT
, Fachabteilung
, and FACHEXPERTISE
need to be the same, and if there are more than 3 entries for that element in my MAP Dataframe, I need to narrow it down to 3 elements.
Right now I have no clue how to do this because my Python knowledge when it comes to iterating on single rows is almost non-existent.
I tried to think of a way to do this with iterrows
but I can not come up with a quick solution right now.
I have two Dataframes that look like this:
df = pd.DataFrame({'PERSONALNUMMER': {4756: '0209740',4820: '0234212',4855: '0251297',4750: '0209326',4992: '4000404'},
'MANDANT': {4756: 'OM', 4820: 'OM', 4855: 'OM', 4750: 'OM', 4992: 'OM'},
'Fachabteilung': {4756: 'HA2300',4820: 'HA2300',4855: 'HA2300',4750: 'HA2300',4992: 'HA2300'},
'FACHEXPERTISE': {4756: 'AQ10',4820: 'AQ10',4855: 'AQ10',4750: 'AQ10',4992: 'AQ10'},
'Leistungsgruppe': {4756: pd.NA,4820: 'Endoprothetik Knie',4855: 'Allgemeine Chirurgie',4750: 'Wirbelsaeuleneingriffe',4992: pd.NA}})
MAP = pd.DataFrame({'MANDANT': {238: 'OM', 239: 'OM', 240: 'OM', 241: 'OM'},
'Fachabteilung': {238: 'HA2300', 239: 'HA2300', 240: 'HA2300', 241: 'HA2300'},
'FACHEXPERTISE': {238: 'AQ10', 239: 'AQ10', 240: 'AQ10', 241: 'AQ10'},
'Leistungsgruppe': {238: 'Allgemeine Chirurgie',
239: 'Endoprothetik Huefte',240: 'Endoprothetik Knie',241: 'Revision Huefte'},
'VK': {238: 3,239: 14,240: 28,241: 22}})
I want to do the following:
I have empty entries in my df Dataframe in column Leistungsgruppe
and I need to pick random elements out of my MAP Dataframe based on conditions.
So, I want to iterate over df
and pick element(s)
The conditions are like: the MANDANT
, Fachabteilung
, and FACHEXPERTISE
need to be the same, and if there are more than 3 entries for that element in my MAP Dataframe, I need to narrow it down to 3 elements.
Right now I have no clue how to do this because my Python knowledge when it comes to iterating on single rows is almost non-existent.
I tried to think of a way to do this with iterrows
but I can not come up with a quick solution right now.
- You should provide a example of the expected output, this will make your question easier to understand and more useful to future readers ;) – mozway Commented Mar 25 at 8:09
- Especially the part about the "more than 3 entries", which is currently left for use to assume the exact logic. – mozway Commented Mar 25 at 8:09
- Yes Mike. Please update your question and your expected output. Make your requirement more clearer. I am unable to understand your requirement – Soudipta Dutta Commented Mar 25 at 9:13
1 Answer
Reset to default 0First select up to 3 values per group with groupby.sample
, then merge
in order of the rows (with a secondary key created with groupby.cumcount
) and fillna
:
# columns to use a group
group = ['MANDANT', 'Fachabteilung', 'FACHEXPERTISE']
# column to fill
col = 'Leistungsgruppe'
# sampling 3 rows per group
sample = MAP.groupby(group)[group+[col]].sample(3)
# compute a key to enumerate the NA values
key_df = df.groupby(group)[col].transform(lambda x: x.isna().cumsum().sub(1))
# merging in order
# set original index (merge doesn't keep the index)
fill = pd.merge(df[group], sample,
left_on=group+[key_df],
right_on=group+[sample.groupby(group).cumcount()],
how='left'
)[col].set_axis(df.index)
# fill the missing values
df[col] = df[col].fillna(fill)
If you can have more values to fill in df
than 3, then you could use a modulo to generate the merging key:
map_size = df[group].merge(MAP.groupby(group, as_index=False).size())['size'].values
key_df = np.random.randint(3, size=len(df)) % map_size
Another option with a custom function. First aggregate MAP
to keep 3 values, then sample with numpy.random.choice
in a groupby
operation from df
:
# get samples per group
samples = MAP.groupby(group)[col].agg(lambda x: x.sample(3))
# fill with a custom function
def fill_resample(x, pool=samples):
return x.fillna(pd.Series(np.random.choice(pool[x.name],
size=len(x),
replace=True),
index=x.index))
df[col] = df.groupby(group)[col] .transform(fill_resample)
Example output:
PERSONALNUMMER MANDANT Fachabteilung FACHEXPERTISE Leistungsgruppe
4756 0209740 OM HA2300 AQ10 Allgemeine Chirurgie
4820 0234212 OM HA2300 AQ10 Endoprothetik Knie
4855 0251297 OM HA2300 AQ10 Allgemeine Chirurgie
4750 0209326 OM HA2300 AQ10 Wirbelsaeuleneingriffe
4992 4000404 OM HA2300 AQ10 Endoprothetik Huefte
本文标签: pythonpick n different random samples from subgroupStack Overflow
版权声明:本文标题:python - pick n different random samples from subgroup - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744211698a2595447.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论