admin管理员组文章数量:1391989
Generate any considerably large dataset:
import pandas as pd
import polars as pl
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(
n_samples=3_000_000,
n_features=100,
n_informative=20,
n_redundant=10,
n_repeated=0,
n_classes=2,
random_state=42,
shuffle=False,
)
feature_names = [f"feature_{i}" for i in range(X.shape[1])]
X_polars = pl.DataFrame(X, schema=feature_names)
y_polars = pl.Series(values=y, name="target")
X_pandas = X_polars.clone().to_pandas()
if I execute a code block with Jupyter rendering for both text/HTML
and text\plain
representation:
df = X_polars+1
df
The df
table will be rendered and outputed as the outputs under the cell. The problem begins when I re-run the same cell multiple times, and each time the whole memory usage of df
will be accumulatively added:
# First time running
import psutil
print(psutil.virtual_memory().available / (1024 ** 3))
df = X_polars+1
df
out:
26.315258026123047
`df` table
# Fourth time running
out:
19.47336196899414
...
The same behaviour persists even when
- Use
X_pandas
. - Not defining the
df
variable (e.g., useX_polars+1
straight) - On Linux based systems (I am on Windows 10)
- Called
gc.collect()
- Switched IDE to a. Vscode b. Jupyter Notebook c. Jupyter Lab
The problem does not occur, however, when I use
print(X_polars+1)
print(df)
本文标签: pythonJupyter Notebook DataFrame Render Steadily Increasing memory usageStack Overflow
版权声明:本文标题:python - Jupyter Notebook DataFrame Render Steadily Increasing memory usage - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744585360a2614162.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论