python - Expanding polars dataframe with cartesian product of two columns - Stack Overflow

IT技术

更新时间：2025-04-151

admin管理员组
文章数量:1391937

The code below shows a solution I have found in order to expand a dataframe to include the cartesian product of columns A and B, filling in the other columns with null values. I'm wondering if there is a better and more efficient way of solving this?

>>> df = pl.DataFrame({'A': [0, 1, 1],
...                    'B': [1, 1, 2],
...                    'C': [6, 7, 8]})
>>> df
shape: (3, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 1   ┆ 6   │
│ 1   ┆ 1   ┆ 7   │
│ 1   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

>>> df.join(df.select('A').unique().join(df.select('B').unique(), how='cross'), on=['A','B'], how='right')
shape: (4, 3)
┌──────┬─────┬─────┐
│ C    ┆ A   ┆ B   │
│ ---  ┆ --- ┆ --- │
│ i64  ┆ i64 ┆ i64 │
╞══════╪═════╪═════╡
│ 6    ┆ 0   ┆ 1   │
│ null ┆ 0   ┆ 2   │
│ 7    ┆ 1   ┆ 1   │
│ 8    ┆ 1   ┆ 2   │
└──────┴─────┴─────┘

The code below shows a solution I have found in order to expand a dataframe to include the cartesian product of columns A and B, filling in the other columns with null values. I'm wondering if there is a better and more efficient way of solving this?

>>> df = pl.DataFrame({'A': [0, 1, 1],
...                    'B': [1, 1, 2],
...                    'C': [6, 7, 8]})
>>> df
shape: (3, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 1   ┆ 6   │
│ 1   ┆ 1   ┆ 7   │
│ 1   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

>>> df.join(df.select('A').unique().join(df.select('B').unique(), how='cross'), on=['A','B'], how='right')
shape: (4, 3)
┌──────┬─────┬─────┐
│ C    ┆ A   ┆ B   │
│ ---  ┆ --- ┆ --- │
│ i64  ┆ i64 ┆ i64 │
╞══════╪═════╪═════╡
│ 6    ┆ 0   ┆ 1   │
│ null ┆ 0   ┆ 2   │
│ 7    ┆ 1   ┆ 1   │
│ 8    ┆ 1   ┆ 2   │
└──────┴─────┴─────┘

Share Improve this question asked Mar 14 at 7:53 rindis 1,1491 gold badge15 silver badges31 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 4

This is a requested feature (this is available in R or pandas's janitor as complete).

An alternative approach mentioned in the feature request would be:

(df.select(pl.col(['A', 'B']).unique().sort().implode())
   .explode('A')
   .explode('B')
   .join(df, how='left', on=['A', 'B'])
)

Which makes it easy to generalize to a greater number of columns.

Output:

┌─────┬─────┬──────┐
│ A   ┆ B   ┆ C    │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ i64  │
╞═════╪═════╪══════╡
│ 0   ┆ 1   ┆ 6    │
│ 0   ┆ 2   ┆ null │
│ 1   ┆ 1   ┆ 7    │
│ 1   ┆ 2   ┆ 8    │
└─────┴─────┴──────┘

You can use janitorplete; it is a wrapper around polars functions and replicates what @mozway has already provided in the accepted answer. I am a contributor the pyjanitor library:

# pip install pyjanitor
import janitor.polars
import polars as pl

dfplete('A','B')
Out[6]:
shape: (4, 3)
┌─────┬─────┬──────┐
│ A   ┆ B   ┆ C    │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ i64  │
╞═════╪═════╪══════╡
│ 0   ┆ 1   ┆ 6    │
│ 0   ┆ 2   ┆ null │
│ 1   ┆ 1   ┆ 7    │
│ 1   ┆ 2   ┆ 8    │
└─────┴─────┴──────┘

本文标签： pythonExpanding polars dataframe with cartesian product of two columnsStack Overflow

版权声明：本文标题：python - Expanding polars dataframe with cartesian product of two columns - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744668641a2618687.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - Expanding polars dataframe with cartesian product of two columns - Stack Overflow

2 Answers 2

更多相关文章