admin管理员组文章数量:1391937
The code below shows a solution I have found in order to expand a dataframe to include the cartesian product of columns A
and B
, filling in the other columns with null
values. I'm wondering if there is a better and more efficient way of solving this?
>>> df = pl.DataFrame({'A': [0, 1, 1],
... 'B': [1, 1, 2],
... 'C': [6, 7, 8]})
>>> df
shape: (3, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 1 ┆ 6 │
│ 1 ┆ 1 ┆ 7 │
│ 1 ┆ 2 ┆ 8 │
└─────┴─────┴─────┘
>>> df.join(df.select('A').unique().join(df.select('B').unique(), how='cross'), on=['A','B'], how='right')
shape: (4, 3)
┌──────┬─────┬─────┐
│ C ┆ A ┆ B │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════╪═════╪═════╡
│ 6 ┆ 0 ┆ 1 │
│ null ┆ 0 ┆ 2 │
│ 7 ┆ 1 ┆ 1 │
│ 8 ┆ 1 ┆ 2 │
└──────┴─────┴─────┘
The code below shows a solution I have found in order to expand a dataframe to include the cartesian product of columns A
and B
, filling in the other columns with null
values. I'm wondering if there is a better and more efficient way of solving this?
>>> df = pl.DataFrame({'A': [0, 1, 1],
... 'B': [1, 1, 2],
... 'C': [6, 7, 8]})
>>> df
shape: (3, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 1 ┆ 6 │
│ 1 ┆ 1 ┆ 7 │
│ 1 ┆ 2 ┆ 8 │
└─────┴─────┴─────┘
>>> df.join(df.select('A').unique().join(df.select('B').unique(), how='cross'), on=['A','B'], how='right')
shape: (4, 3)
┌──────┬─────┬─────┐
│ C ┆ A ┆ B │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════╪═════╪═════╡
│ 6 ┆ 0 ┆ 1 │
│ null ┆ 0 ┆ 2 │
│ 7 ┆ 1 ┆ 1 │
│ 8 ┆ 1 ┆ 2 │
└──────┴─────┴─────┘
Share
Improve this question
asked Mar 14 at 7:53
rindisrindis
1,1491 gold badge15 silver badges31 bronze badges
2 Answers
Reset to default 4This is a requested feature (this is available in R or pandas's janitor as complete
).
An alternative approach mentioned in the feature request would be:
(df.select(pl.col(['A', 'B']).unique().sort().implode())
.explode('A')
.explode('B')
.join(df, how='left', on=['A', 'B'])
)
Which makes it easy to generalize to a greater number of columns.
Output:
┌─────┬─────┬──────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪══════╡
│ 0 ┆ 1 ┆ 6 │
│ 0 ┆ 2 ┆ null │
│ 1 ┆ 1 ┆ 7 │
│ 1 ┆ 2 ┆ 8 │
└─────┴─────┴──────┘
You can use janitorplete; it is a wrapper around polars functions and replicates what @mozway has already provided in the accepted answer. I am a contributor the pyjanitor library:
# pip install pyjanitor
import janitor.polars
import polars as pl
dfplete('A','B')
Out[6]:
shape: (4, 3)
┌─────┬─────┬──────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪══════╡
│ 0 ┆ 1 ┆ 6 │
│ 0 ┆ 2 ┆ null │
│ 1 ┆ 1 ┆ 7 │
│ 1 ┆ 2 ┆ 8 │
└─────┴─────┴──────┘
本文标签: pythonExpanding polars dataframe with cartesian product of two columnsStack Overflow
版权声明:本文标题:python - Expanding polars dataframe with cartesian product of two columns - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744668641a2618687.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论