admin管理员组

文章数量:1313731

I'm trying to account for room usage only during business hours and abridge an event duration if it runs past the end of business hours.

I have a dataframe like this:

import polars as pl
from datetime import datetime

df = pl.DataFrame({
    'name': 'foo',
    'start': datetime.fromisoformat('2025-01-01 08:00:00'),
    'end': datetime.fromisoformat('2025-01-01 18:00:00'), # ends after business hours
    'business_end': datetime.fromisoformat('2025-01-01 17:00:00')
})

I want to create a duration column that is equal to end unless it's after business_end otherwise set to business_end. For this, I tried the following:

df.with_columns(
    duration=pl.col("end") - pl.col("start")
    if pl.col("end") <= pl.col("business_end")
    else pl.col("business_end") - pl.col("start")
)

This gives an error:

TypeError: the truth value of an Expr is ambiguous

Thoughts about how to produce the desired row from the conditional?

I can use filter() to find rows where event ends are after business ends, create a frame of those, replace the end time value, merge back in, etc. but I was hoping to keep the original data and only add a new column.

I'm trying to account for room usage only during business hours and abridge an event duration if it runs past the end of business hours.

I have a dataframe like this:

import polars as pl
from datetime import datetime

df = pl.DataFrame({
    'name': 'foo',
    'start': datetime.fromisoformat('2025-01-01 08:00:00'),
    'end': datetime.fromisoformat('2025-01-01 18:00:00'), # ends after business hours
    'business_end': datetime.fromisoformat('2025-01-01 17:00:00')
})

I want to create a duration column that is equal to end unless it's after business_end otherwise set to business_end. For this, I tried the following:

df.with_columns(
    duration=pl.col("end") - pl.col("start")
    if pl.col("end") <= pl.col("business_end")
    else pl.col("business_end") - pl.col("start")
)

This gives an error:

TypeError: the truth value of an Expr is ambiguous

Thoughts about how to produce the desired row from the conditional?

I can use filter() to find rows where event ends are after business ends, create a frame of those, replace the end time value, merge back in, etc. but I was hoping to keep the original data and only add a new column.

Share Improve this question edited Jan 30 at 16:23 Hericks 10.5k2 gold badges24 silver badges34 bronze badges asked Jan 30 at 14:47 Bryan AndreggBryan Andregg 331 silver badge3 bronze badges 0
Add a comment  | 

2 Answers 2

Reset to default 6

Short answer

You use when/then/otherwise instead of if else

    df.with_columns(
        duration=pl.when(pl.col("end") <= pl.col("business_end"))
        .then(pl.col("end") - pl.col("start"))
        .otherwise(pl.col("business_end") - pl.col("start"))
    )

Background

polars works with expressions inside contexts. What's that mean?

Contexts are your with_columns, select, group_by, agg, etc.

The inputs to contexts are expressions. Expressions usually start with pl.col() or pl.lit(). They have lots of methods which also return expressions which makes them chainable.

The thing about expressions is that they don't have values, they're just instructions. One way to see that clearly is to assign an expression to a normal variable like end=pl.col("end"). You can do that without any DataFrames existing. Once you have a df, you can use that expr in its context df.select(end). When the select context gets the expression pl.col("end"), that's when it'll go fetch the column. You could also make a more complicated expression like my_sum = (pl.col("a") * 2 + pl.col("b").pow(3)) and then even chain off of it df.select(my_sum*2+5)

Now getting back to the if, because pl.col("end") doesn't have any values associated with it, python can't evaluate if pl.col("end") <= pl.col("other") which is why you're getting that error. python doesn't have an overload for if so you just can't use it inside a context.

Instead you can use the when then otherwise construct.

@DeanMacGregor already provided a great answer on the origin of the TypeError.

For completeness, the expected outcome could also be computed without explicitly relying on conditionals as follows.

df.with_columns(
    duration=pl.min_horizontal("end", "business_end") - pl.col("start")
)
shape: (1, 5)
┌──────┬─────────────────────┬─────────────────────┬─────────────────────┬──────────────┐
│ name ┆ start               ┆ end                 ┆ business_end        ┆ duration     │
│ ---  ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---          │
│ str  ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ duration[μs] │
╞══════╪═════════════════════╪═════════════════════╪═════════════════════╪══════════════╡
│ foo  ┆ 2025-01-01 08:00:00 ┆ 2025-01-01 18:00:00 ┆ 2025-01-01 17:00:00 ┆ 9h           │
└──────┴─────────────────────┴─────────────────────┴─────────────────────┴──────────────┘

本文标签: pythontruth value for Expr is ambiguous in withcolumns ternary expansion on datesStack Overflow