admin管理员组文章数量:1122846
I have a pl.LazyFrame
with a number of columns. One of the columns is called signal
and is supposed to have dtype=pl.Int8
. It only contains 0
and 1
.
This will be confirmed if I do collect_schema
.
However, when I actually collect
the dataframe, the dtype
switches to pl.Int32
.
I wasn't able to come up with a toy example, so I show the behaviour with my existing pl.LazyFrame
. Hopefully somebody can still point me in the right direction.
In [1]: lf.select(pl.col("signal")).collect_schema()
Out[1]: Schema([('signal', Int8)])
In [2]: lf.select(pl.col("signal")).collect()
Out[2]:
shape: (7_556, 1)
┌────────┐
│ signal │
│ --- │
│ i32 │
╞════════╡
│ 0 │
│ 0 │
│ 0 │
│ 0 │
│ 1 │
│ … │
│ 1 │
│ 1 │
│ 1 │
│ 0 │
│ 0 │
└────────┘
In [3]: lf.select(pl.col("signal")).collect().collect_schema()
Out[3]: Schema([('signal', Int32)])
In [4]: lf.select(pl.col("signal")).collect().describe()
Out[4]:
shape: (9, 2)
┌────────────┬──────────┐
│ statistic ┆ signal │
│ --- ┆ --- │
│ str ┆ f64 │
╞════════════╪══════════╡
│ count ┆ 7556.0 │
│ null_count ┆ 0.0 │
│ mean ┆ 0.55585 │
│ std ┆ 0.496904 │
│ min ┆ 0.0 │
│ 25% ┆ 0.0 │
│ 50% ┆ 1.0 │
│ 75% ┆ 1.0 │
│ max ┆ 1.0 │
└────────────┴──────────┘
In my view, this looks like a bug, doesn't it?
本文标签: pythondtype changes during collect process in polars dataframeStack Overflow
版权声明:本文标题:python - dtype changes during collect process in polars dataframe - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736310691a1934470.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论