admin管理员组文章数量:1291174
I have a Polars DataFrame where each id can appear multiple times with different state values (either 1 or 2). I want to count how many unique ids have only state 1, only state 2, or both states 1 and 2.
import polars as pl
df = pl.DataFrame({
"id": [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 20, 20, 20],
"state": [1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2]
})
I want to count how many unique ids fall into each category:
• Only state 1 (e.g., IDs that only have 1)
• Only state 2 (e.g., IDs that only have 2)
• Both states 1 and 2 (e.g., IDs that have both 1 and 2)
Expected Result (Example):
State combination [1] -> 20 IDs
State combination [2] -> 15 IDs
State combination [1, 2] -> 30 IDs
I have a Polars DataFrame where each id can appear multiple times with different state values (either 1 or 2). I want to count how many unique ids have only state 1, only state 2, or both states 1 and 2.
import polars as pl
df = pl.DataFrame({
"id": [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 20, 20, 20],
"state": [1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2]
})
I want to count how many unique ids fall into each category:
• Only state 1 (e.g., IDs that only have 1)
• Only state 2 (e.g., IDs that only have 2)
• Both states 1 and 2 (e.g., IDs that have both 1 and 2)
Expected Result (Example):
State combination [1] -> 20 IDs
State combination [2] -> 15 IDs
State combination [1, 2] -> 30 IDs
Share
Improve this question
asked Feb 13 at 20:40
SimonSimon
1,0054 silver badges15 bronze badges
3 Answers
Reset to default 4You need two group_by
s, the first to combine the id
s into the states and then another group_by
for the states to count up the number of id
s
(
df
.group_by("id")
.agg(pl.col("state").unique().sort())
.group_by("state")
.len()
)
You could group by the id
and use .all()
and .any()
to check the states.
(df.group_by("id")
.agg(
one = (pl.col.state == 1).all(),
two = (pl.col.state == 2).all(),
both = (pl.col.state == 1).any() & (pl.col.state == 2).any()
# both = pl.lit(1).is_in("state") & pl.lit(2).is_in("state")
)
# .select(pl.exclude("id").sum())
)
shape: (20, 4)
┌─────┬───────┬───────┬───────┐
│ id ┆ one ┆ two ┆ both │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ bool ┆ bool ┆ bool │
╞═════╪═══════╪═══════╪═══════╡
│ 6 ┆ false ┆ true ┆ false │
│ 3 ┆ false ┆ true ┆ false │
│ 2 ┆ true ┆ false ┆ false │
│ 12 ┆ true ┆ false ┆ false │
│ 16 ┆ false ┆ false ┆ true │
│ … ┆ … ┆ … ┆ … │
│ 9 ┆ false ┆ false ┆ true │
│ 13 ┆ false ┆ true ┆ false │
│ 8 ┆ false ┆ true ┆ false │
│ 15 ┆ false ┆ false ┆ true │
│ 10 ┆ false ┆ false ┆ true │
└─────┴───────┴───────┴───────┘
The .sum()
of the bool columns are the counts.
shape: (1, 3)
┌─────┬─────┬──────┐
│ one ┆ two ┆ both │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪══════╡
│ 6 ┆ 7 ┆ 7 │
└─────┴─────┴──────┘
For future reference, please provide what the correct output should be instead of just an example output.
You can perform a group by, take the unique States for each ID, then take the value counts of that
combinations = df.group_by('id').agg(pl.col('state').unique())
counts = combinations.select(pl.col('state').value_counts().alias('counts'))
print(counts.unnest('counts'))
assert (counts.select(pl.col('counts').struct.field('count').sum()) == df.n_unique('id')).item()
# Alternatively, as a single expression:
print(df.select(
pl.col('state').unique().implode()
.over('id', mapping_strategy='explode')
.value_counts()
.struct.unnest()
))
本文标签: pythonHow to count unique state combinations per ID in a Polars DataFrameStack Overflow
版权声明:本文标题:python - How to count unique state combinations per ID in a Polars DataFrame - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741506588a2382359.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论