python - compute named quantiles in pandas using groupby aggregate - Stack Overflow

IT技术

更新时间：2025-04-184

admin管理员组
文章数量:1406177

Among other descriptive statistics, I want to get some quantiles out of my pandas DataFrame. I can get the quantiles I want a couple of different ways, but I can't find the right way to do it with aggregate. I'd like to use aggregate because it'd be tidy and maybe computationally efficient to get all my stats in one go.

rng = np.random.default_rng(seed=18860504)
df = pd.DataFrame({
    "dummy": 1,
    "bell": rng.normal(loc=0, scale=1, size=100),
    "fish": rng.poisson(lam=10, size=100),
    "cabin": rng.lognormal(mean=0, sigma=1.0, size=100),
})
quants = [x/5 for x in range(6)]
quantiles = pd.DataFrame({
    "quantile" : [f"q{100*q:02n}" for q in quants],
    "bell" : df.groupby("dummy")["bell"].quantile(quants),
    "fish" : df.groupby("dummy")["fish"].quantile(quants),
}) 
print(quantiles)

Output:

          quantile      bell  fish
dummy                             
1     0.0     q000 -2.313461   4.0
      0.2     q020 -0.933831   7.0
      0.4     q040 -0.246860   9.0
      0.6     q060  0.211076  10.0
      0.8     q080  0.685958  13.0
      1.0     q100  3.017258  20.0

I'd like to get these quantiles using groupby().agg(), ideally with programmatically named columns like "bell_q90". Here's an example of the aggregate syntax that feels natural to me:

df.groupby("dummy").agg(
    bell_med=("bell", "median"),
    bell_mean=("bell", "mean"),
    fish_med=("fish", "median"),
    fish_mean=("fish", "mean"),
    # fish_q10=("fish", "quantile(0.1)"), # nothing like it 
    # fish_q10=("fish", "quantile", 0.1), # nothing like it 
    # fish_q10=("fish", "quantile", kwargs({"q":0.1}), # nothing like it 
)

I can imagine generating the columns by iterating over quants and a list of named columns, using Series.agg and than stitching them together, but this seems like a hack. (For example, it would require me to do my "normal" aggregation first and then add quantiles on afterwards.)

my_aggs = dict()
for q in quants:
    for col in ["bell", "fish"]:
        my_aggs[f"{col}_q{100*q:03n}"] = df.groupby("dummy")[col].quantile(q) 

print(pd.DataFrame(my_aggs)) # numbers equivalent to those above

Is there a better way?

Among other descriptive statistics, I want to get some quantiles out of my pandas DataFrame. I can get the quantiles I want a couple of different ways, but I can't find the right way to do it with aggregate. I'd like to use aggregate because it'd be tidy and maybe computationally efficient to get all my stats in one go.

rng = np.random.default_rng(seed=18860504)
df = pd.DataFrame({
    "dummy": 1,
    "bell": rng.normal(loc=0, scale=1, size=100),
    "fish": rng.poisson(lam=10, size=100),
    "cabin": rng.lognormal(mean=0, sigma=1.0, size=100),
})
quants = [x/5 for x in range(6)]
quantiles = pd.DataFrame({
    "quantile" : [f"q{100*q:02n}" for q in quants],
    "bell" : df.groupby("dummy")["bell"].quantile(quants),
    "fish" : df.groupby("dummy")["fish"].quantile(quants),
}) 
print(quantiles)

Output:

          quantile      bell  fish
dummy                             
1     0.0     q000 -2.313461   4.0
      0.2     q020 -0.933831   7.0
      0.4     q040 -0.246860   9.0
      0.6     q060  0.211076  10.0
      0.8     q080  0.685958  13.0
      1.0     q100  3.017258  20.0

I'd like to get these quantiles using groupby().agg(), ideally with programmatically named columns like "bell_q90". Here's an example of the aggregate syntax that feels natural to me:

df.groupby("dummy").agg(
    bell_med=("bell", "median"),
    bell_mean=("bell", "mean"),
    fish_med=("fish", "median"),
    fish_mean=("fish", "mean"),
    # fish_q10=("fish", "quantile(0.1)"), # nothing like it 
    # fish_q10=("fish", "quantile", 0.1), # nothing like it 
    # fish_q10=("fish", "quantile", kwargs({"q":0.1}), # nothing like it 
)

I can imagine generating the columns by iterating over quants and a list of named columns, using Series.agg and than stitching them together, but this seems like a hack. (For example, it would require me to do my "normal" aggregation first and then add quantiles on afterwards.)

my_aggs = dict()
for q in quants:
    for col in ["bell", "fish"]:
        my_aggs[f"{col}_q{100*q:03n}"] = df.groupby("dummy")[col].quantile(q) 

print(pd.DataFrame(my_aggs)) # numbers equivalent to those above

Is there a better way?

Share Improve this question asked Mar 6 at 16:34 flies 2,1772 gold badges25 silver badges38 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 3

You could use a function factory to simplify the syntax:

def quantile(q=0.5, **kwargs):
    def f(series):
        return series.quantile(q, **kwargs)
    return f
    
df.groupby('dummy').agg(
    bell_med=('bell', 'median'),
    bell_mean=('bell', 'mean'),
    fish_med=('fish', 'median'),
    fish_mean=('fish', 'mean'),
    bell_q10=('bell', quantile(0.1)),
    fish_q10=('fish', quantile(0.1)),
)

If you have many combinations, you could also combine this with a dictionary comprehension and parameter expansion:

df.groupby('dummy').agg(**{'bell_med': ('bell', 'median'),
                           'bell_mean': ('bell', 'mean'),
                           'fish_med': ('fish', 'median'),
                           'fish_mean': ('fish', 'mean'),
                           },
                        **{f'{c}_q{100*q:02n}': (c, quantile(q))
                           for q in [0.1] # add more if needed
                           for c in ['bell', 'fish']
                          }
                       )

Output:


       bell_med  bell_mean  fish_med  fish_mean  bell_q10  fish_q10
dummy                                                              
1     -0.063454  -0.058557      10.0       9.92 -1.553682       6.0

Consider lambda to expand the Series.quantile call:

agg_df = df.groupby('dummy').agg(
    bell_med=('bell', 'median'),
    bell_mean=('bell', 'mean'),
    fish_med=('fish', 'median'),
    fish_mean=('fish', 'mean'),
    bell_q10=('bell', lambda ser: ser.quantile(0.1)),
    fish_q10=('fish', lambda ser: ser.quantile(0.1))
)

agg_df
#        bell_med  bell_mean  fish_med  fish_mean  bell_q10  fish_q10
# dummy                                                              
# 1     -0.063454  -0.058557      10.0       9.92 -1.553682       6.0

To borrow @mozway's dynamic solution to dictionarize the aggregate functions, an extra lambda input is needed within the dict comprehension:

agg_df = df.groupby('dummy').agg(
    **{
        'bell_med': ('bell', 'median'),
        'bell_mean': ('bell', 'mean'),
        'fish_med': ('fish', 'median'),
        'fish_mean': ('fish', 'mean'),
    },
    **{
        f'{c}_q{100*q:02n}': (c, lambda ser, q=q: ser.quantile(q))
        for q in [0.1, 0.5, 0.9]
        for c in ['bell', 'fish']
    }
)

agg_df
#        bell_med  bell_mean  fish_med  fish_mean  bell_q10  fish_q10  bell_q50  fish_q50  bell_q90  fish_q90
# dummy                                                                                                      
# 1     -0.063454  -0.058557      10.0       9.92 -1.553682       6.0 -0.063454      10.0  1.045002      14.0

本文标签： pythoncompute named quantiles in pandas using groupby aggregateStack Overflow

版权声明：本文标题：python - compute named quantiles in pandas using groupby aggregate - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744962145a2634721.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - compute named quantiles in pandas using groupby aggregate - Stack Overflow

2 Answers 2

更多相关文章