admin管理员组文章数量:1406177
Among other descriptive statistics, I want to get some quantiles out of my pandas DataFrame
. I can get the quantiles I want a couple of different ways, but I can't find the right way to do it with aggregate
. I'd like to use aggregate because it'd be tidy and maybe computationally efficient to get all my stats in one go.
rng = np.random.default_rng(seed=18860504)
df = pd.DataFrame({
"dummy": 1,
"bell": rng.normal(loc=0, scale=1, size=100),
"fish": rng.poisson(lam=10, size=100),
"cabin": rng.lognormal(mean=0, sigma=1.0, size=100),
})
quants = [x/5 for x in range(6)]
quantiles = pd.DataFrame({
"quantile" : [f"q{100*q:02n}" for q in quants],
"bell" : df.groupby("dummy")["bell"].quantile(quants),
"fish" : df.groupby("dummy")["fish"].quantile(quants),
})
print(quantiles)
Output:
quantile bell fish
dummy
1 0.0 q000 -2.313461 4.0
0.2 q020 -0.933831 7.0
0.4 q040 -0.246860 9.0
0.6 q060 0.211076 10.0
0.8 q080 0.685958 13.0
1.0 q100 3.017258 20.0
I'd like to get these quantiles using groupby().agg()
, ideally with programmatically named columns like "bell_q90". Here's an example of the aggregate
syntax that feels natural to me:
df.groupby("dummy").agg(
bell_med=("bell", "median"),
bell_mean=("bell", "mean"),
fish_med=("fish", "median"),
fish_mean=("fish", "mean"),
# fish_q10=("fish", "quantile(0.1)"), # nothing like it
# fish_q10=("fish", "quantile", 0.1), # nothing like it
# fish_q10=("fish", "quantile", kwargs({"q":0.1}), # nothing like it
)
I can imagine generating the columns by iterating over quants
and a list of named columns, using Series.agg
and than stitching them together, but this seems like a hack. (For example, it would require me to do my "normal" aggregation first and then add quantiles on afterwards.)
my_aggs = dict()
for q in quants:
for col in ["bell", "fish"]:
my_aggs[f"{col}_q{100*q:03n}"] = df.groupby("dummy")[col].quantile(q)
print(pd.DataFrame(my_aggs)) # numbers equivalent to those above
Is there a better way?
Among other descriptive statistics, I want to get some quantiles out of my pandas DataFrame
. I can get the quantiles I want a couple of different ways, but I can't find the right way to do it with aggregate
. I'd like to use aggregate because it'd be tidy and maybe computationally efficient to get all my stats in one go.
rng = np.random.default_rng(seed=18860504)
df = pd.DataFrame({
"dummy": 1,
"bell": rng.normal(loc=0, scale=1, size=100),
"fish": rng.poisson(lam=10, size=100),
"cabin": rng.lognormal(mean=0, sigma=1.0, size=100),
})
quants = [x/5 for x in range(6)]
quantiles = pd.DataFrame({
"quantile" : [f"q{100*q:02n}" for q in quants],
"bell" : df.groupby("dummy")["bell"].quantile(quants),
"fish" : df.groupby("dummy")["fish"].quantile(quants),
})
print(quantiles)
Output:
quantile bell fish
dummy
1 0.0 q000 -2.313461 4.0
0.2 q020 -0.933831 7.0
0.4 q040 -0.246860 9.0
0.6 q060 0.211076 10.0
0.8 q080 0.685958 13.0
1.0 q100 3.017258 20.0
I'd like to get these quantiles using groupby().agg()
, ideally with programmatically named columns like "bell_q90". Here's an example of the aggregate
syntax that feels natural to me:
df.groupby("dummy").agg(
bell_med=("bell", "median"),
bell_mean=("bell", "mean"),
fish_med=("fish", "median"),
fish_mean=("fish", "mean"),
# fish_q10=("fish", "quantile(0.1)"), # nothing like it
# fish_q10=("fish", "quantile", 0.1), # nothing like it
# fish_q10=("fish", "quantile", kwargs({"q":0.1}), # nothing like it
)
I can imagine generating the columns by iterating over quants
and a list of named columns, using Series.agg
and than stitching them together, but this seems like a hack. (For example, it would require me to do my "normal" aggregation first and then add quantiles on afterwards.)
my_aggs = dict()
for q in quants:
for col in ["bell", "fish"]:
my_aggs[f"{col}_q{100*q:03n}"] = df.groupby("dummy")[col].quantile(q)
print(pd.DataFrame(my_aggs)) # numbers equivalent to those above
Is there a better way?
Share Improve this question asked Mar 6 at 16:34 fliesflies 2,1772 gold badges25 silver badges38 bronze badges2 Answers
Reset to default 3You could use a function factory to simplify the syntax:
def quantile(q=0.5, **kwargs):
def f(series):
return series.quantile(q, **kwargs)
return f
df.groupby('dummy').agg(
bell_med=('bell', 'median'),
bell_mean=('bell', 'mean'),
fish_med=('fish', 'median'),
fish_mean=('fish', 'mean'),
bell_q10=('bell', quantile(0.1)),
fish_q10=('fish', quantile(0.1)),
)
If you have many combinations, you could also combine this with a dictionary comprehension and parameter expansion:
df.groupby('dummy').agg(**{'bell_med': ('bell', 'median'),
'bell_mean': ('bell', 'mean'),
'fish_med': ('fish', 'median'),
'fish_mean': ('fish', 'mean'),
},
**{f'{c}_q{100*q:02n}': (c, quantile(q))
for q in [0.1] # add more if needed
for c in ['bell', 'fish']
}
)
Output:
bell_med bell_mean fish_med fish_mean bell_q10 fish_q10
dummy
1 -0.063454 -0.058557 10.0 9.92 -1.553682 6.0
Consider lambda
to expand the Series.quantile
call:
agg_df = df.groupby('dummy').agg(
bell_med=('bell', 'median'),
bell_mean=('bell', 'mean'),
fish_med=('fish', 'median'),
fish_mean=('fish', 'mean'),
bell_q10=('bell', lambda ser: ser.quantile(0.1)),
fish_q10=('fish', lambda ser: ser.quantile(0.1))
)
agg_df
# bell_med bell_mean fish_med fish_mean bell_q10 fish_q10
# dummy
# 1 -0.063454 -0.058557 10.0 9.92 -1.553682 6.0
To borrow @mozway's dynamic solution to dictionarize the aggregate functions, an extra lambda input is needed within the dict comprehension:
agg_df = df.groupby('dummy').agg(
**{
'bell_med': ('bell', 'median'),
'bell_mean': ('bell', 'mean'),
'fish_med': ('fish', 'median'),
'fish_mean': ('fish', 'mean'),
},
**{
f'{c}_q{100*q:02n}': (c, lambda ser, q=q: ser.quantile(q))
for q in [0.1, 0.5, 0.9]
for c in ['bell', 'fish']
}
)
agg_df
# bell_med bell_mean fish_med fish_mean bell_q10 fish_q10 bell_q50 fish_q50 bell_q90 fish_q90
# dummy
# 1 -0.063454 -0.058557 10.0 9.92 -1.553682 6.0 -0.063454 10.0 1.045002 14.0
本文标签: pythoncompute named quantiles in pandas using groupby aggregateStack Overflow
版权声明:本文标题:python - compute named quantiles in pandas using groupby aggregate - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744962145a2634721.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论