admin管理员组

文章数量:1180493

I am trying to filter my Polars DataFrame for dates that are nearest to a given date.

For example:

import polars
import datetime

data = {
    "date": ["2025-01-01", "2025-01-01", "2025-01-01", "2026-01-01"],
    "value": [1, 2, 3, 4],
}

df = polars.DataFrame(data).with_columns([polars.col("date").cast(polars.Date)])
shape: (4, 2)
┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1     │
│ 2025-01-01 ┆ 2     │
│ 2025-01-01 ┆ 3     │
│ 2026-01-01 ┆ 4     │
└────────────┴───────┘

Given a date, say:

date = datetime.date(2024, 12, 31)

I want to filter the DataFrame for rows where the date column only includes records that are closest to my required date.

I know that I can do the following:

result = df.with_columns(
    diff=(polars.col("date") - date).abs()
).filter(
    polars.col("diff") == polars.min("diff")
)
shape: (3, 3)
┌────────────┬───────┬──────────────┐
│ date       ┆ value ┆ diff         │
│ ---        ┆ ---   ┆ ---          │
│ date       ┆ i64   ┆ duration[ms] │
╞════════════╪═══════╪══════════════╡
│ 2025-01-01 ┆ 1     ┆ 1d           │
│ 2025-01-01 ┆ 2     ┆ 1d           │
│ 2025-01-01 ┆ 3     ┆ 1d           │
└────────────┴───────┴──────────────┘

Is there a more succinct way to achieve this (without creating a new column, for example)?

I am trying to filter my Polars DataFrame for dates that are nearest to a given date.

For example:

import polars
import datetime

data = {
    "date": ["2025-01-01", "2025-01-01", "2025-01-01", "2026-01-01"],
    "value": [1, 2, 3, 4],
}

df = polars.DataFrame(data).with_columns([polars.col("date").cast(polars.Date)])
shape: (4, 2)
┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1     │
│ 2025-01-01 ┆ 2     │
│ 2025-01-01 ┆ 3     │
│ 2026-01-01 ┆ 4     │
└────────────┴───────┘

Given a date, say:

date = datetime.date(2024, 12, 31)

I want to filter the DataFrame for rows where the date column only includes records that are closest to my required date.

I know that I can do the following:

result = df.with_columns(
    diff=(polars.col("date") - date).abs()
).filter(
    polars.col("diff") == polars.min("diff")
)
shape: (3, 3)
┌────────────┬───────┬──────────────┐
│ date       ┆ value ┆ diff         │
│ ---        ┆ ---   ┆ ---          │
│ date       ┆ i64   ┆ duration[ms] │
╞════════════╪═══════╪══════════════╡
│ 2025-01-01 ┆ 1     ┆ 1d           │
│ 2025-01-01 ┆ 2     ┆ 1d           │
│ 2025-01-01 ┆ 3     ┆ 1d           │
└────────────┴───────┴──────────────┘

Is there a more succinct way to achieve this (without creating a new column, for example)?

Share Improve this question edited yesterday jonrsharpe 122k30 gold badges265 silver badges473 bronze badges asked yesterday FISRFISR 1351 silver badge9 bronze badges 2
  • just curious, why not use as pl in your import? – Dean MacGregor Commented yesterday
  • In words you're saying closest but your code says, find the smallest deviation and bring me all the ties with the smallest deviation. It seems like what you want to filter more like filter(diff<pl.duration(days=1), obviously adjust the duration parameter for what is closest to you. – Dean MacGregor Commented yesterday
Add a comment  | 

1 Answer 1

Reset to default 3

You don't need to add the temporary column, just filter directly:

df.filter((m:=(pl.col('date')-date).abs()).min() == m)

Or, without the walrus operator:

diff = (pl.col('date')-date).abs()
df.filter(diff.min() == diff)

Output:

┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1     │
│ 2025-01-01 ┆ 2     │
│ 2025-01-01 ┆ 3     │
└────────────┴───────┘

本文标签: pythonFilter dataframe by nearest dateStack Overflow