python - Polars out of core sorting and memory usage - Stack Overflow

IT技术

更新时间：2025-04-050

admin管理员组
文章数量:1345452

From what I understand this is a main use case for Polars: being able to process a dataset that is larger than RAM, using disk space if necessary. Yet I am unable to achieve this in a Kubernetes environment. To replicate locally I tried launching a docker container with a low memory limit:

docker run -it --memory=500m --rm  -v `pwd`:/app python:3.12  /bin/bash
# pip install polars==1.26.0

I checked that it set up the memory limit in cgroups for the current process. Then I ran a script that loads a moderately large dataframe (23M parquet file, 158M uncompressed), using scan_parquet, performs a sort, and outputs the head:

source = "parquet/central_west.df"
df = pl.scan_parquet(source, low_memory=True)
query = df.sort("station_code").head()
print(query.collect(engine="streaming"))

This leads to the process getting killed. It works with a smaller dataframe, or a larger limit. Is polars not reading the limit correctly? Or not able to work with that low of a limit? I understand the "new" streaming engine is still in beta, so I tried the same script with version 1.22.0 of polars, but the result was the same. This seems like a very simple and common use case so I hope I am just missing a configuration trick.

On a hunch and based on a similar question I tried setting POLARS_IDEAL_MORSEL_SIZE=100, but that made no difference, and I feel like I am grasping at straws here.

本文标签： pythonPolars out of core sorting and memory usageStack Overflow

版权声明：本文标题：python - Polars out of core sorting and memory usage - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1743788890a2539194.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - Polars out of core sorting and memory usage - Stack Overflow

更多相关文章