python - dtype changes during collect process in polars dataframe - Stack Overflow

IT技术

更新时间：2025-01-088

admin管理员组
文章数量:1122846

I have a pl.LazyFrame with a number of columns. One of the columns is called signal and is supposed to have dtype=pl.Int8. It only contains 0 and 1.

This will be confirmed if I do collect_schema.
However, when I actually collect the dataframe, the dtype switches to pl.Int32.

I wasn't able to come up with a toy example, so I show the behaviour with my existing pl.LazyFrame. Hopefully somebody can still point me in the right direction.

In [1]: lf.select(pl.col("signal")).collect_schema()
Out[1]: Schema([('signal', Int8)])

In [2]: lf.select(pl.col("signal")).collect()
Out[2]: 
shape: (7_556, 1)
┌────────┐
│ signal │
│ ---    │
│ i32    │
╞════════╡
│ 0      │
│ 0      │
│ 0      │
│ 0      │
│ 1      │
│ …      │
│ 1      │
│ 1      │
│ 1      │
│ 0      │
│ 0      │
└────────┘

In [3]: lf.select(pl.col("signal")).collect().collect_schema()
Out[3]: Schema([('signal', Int32)])

In [4]: lf.select(pl.col("signal")).collect().describe()
Out[4]: 
shape: (9, 2)
┌────────────┬──────────┐
│ statistic  ┆ signal   │
│ ---        ┆ ---      │
│ str        ┆ f64      │
╞════════════╪══════════╡
│ count      ┆ 7556.0   │
│ null_count ┆ 0.0      │
│ mean       ┆ 0.55585  │
│ std        ┆ 0.496904 │
│ min        ┆ 0.0      │
│ 25%        ┆ 0.0      │
│ 50%        ┆ 1.0      │
│ 75%        ┆ 1.0      │
│ max        ┆ 1.0      │
└────────────┴──────────┘

In my view, this looks like a bug, doesn't it?

本文标签： pythondtype changes during collect process in polars dataframeStack Overflow

版权声明：本文标题：python - dtype changes during collect process in polars dataframe - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736310691a1934470.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - dtype changes during collect process in polars dataframe - Stack Overflow

更多相关文章