admin管理员组

文章数量:1344241

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no matter the group.

This might be due to the seed value being the same for all combinations? I tried to resolve this by creating a unique seed for each combination by generating a "group_date_int" column by combining group_id and date casted as Int64, but I encountered the following error:

.sample(n=n_samples, shuffle=True, seed=pl.col("group_date_int"))
TypeError: argument 'seed': 'Expr' object cannot be interpreted as an integer

For each date, I am getting the same set of IDs, rather than having a different random sample for each combination of group_id and date.

import pandas as pd
import polars as pl

# MWE
date_range = pd.date_range(start="2010-01-01", end="2025-12-01", freq="MS")
data = []

for current_date in date_range:
    for group_id in ['bd01', 'bd02', 'bd03']:  # Example of 3 different group_ids
        ids = list(range(10))  # Generate 10 IDs for each (group_id, current_date)
        data.extend([(str(current_date.date()), group_id, id_) for id_ in ids])  

# Create Polars DataFrame
df = pl.DataFrame(data, schema=["date", "group_id", "id"])

# Parameters
n_samples = 3  # Number of random samples to pick for each group
SEED = 42  # The seed used for sampling

# Create `selected_samples` by sampling `n_samples` IDs per (group_id, date) combination
selected_samples = (
    df
    .group_by(['group_id', 'date'])
    .agg(
        pl.col("id")
        .sample(n=n_samples, shuffle=True, seed=SEED)  
        .alias("random_ids")
    )
    .explode("random_ids")
    .select(["group_id", "date", "random_ids"])
    .rename({"random_ids": "id"})
)

Additionally, I tried using the shuffle function, but the results are the same.

1,6,5...1,6,5

┌──────────┬────────────┬─────┐
│ group_id ┆ date       ┆ id  │
│ ---      ┆ ---        ┆ --- │
│ str      ┆ str        ┆ i64 │
╞══════════╪════════════╪═════╡
│ bd01     ┆ 2025-07-01 ┆ 1   │
│ bd01     ┆ 2025-07-01 ┆ 6   │
│ bd01     ┆ 2025-07-01 ┆ 5   │
│ bd01     ┆ 2012-03-01 ┆ 1   │
│ bd01     ┆ 2012-03-01 ┆ 6   │
│ …        ┆ …          ┆ …   │
│ bd03     ┆ 2024-10-01 ┆ 6   │
│ bd03     ┆ 2024-10-01 ┆ 5   │
│ bd01     ┆ 2010-08-01 ┆ 1   │
│ bd01     ┆ 2010-08-01 ┆ 6   │
│ bd01     ┆ 2010-08-01 ┆ 5   │
└──────────┴────────────┴─────┘

本文标签: