admin管理员组

文章数量:1287941

How do I write a query like (A or B) and C in Python Polars?

import polars as pl
import numpy as np

df = pl.DataFrame(
    data={
        "a": [0.0, 0.0,    0.0,    0.0,    np.nan, np.nan, np.nan],
        "b": [0.0, 0.0,    np.nan, np.nan, 0.0,    0.0,    np.nan],
        "c": [0.0, np.nan, 0.0,    np.nan, 0.0,    np.nan, np.nan]
    }
)

df.with_columns(
    ((pl.col('a').is_not_nan() | pl.col('b').is_not_nan())
     & pl.col('c').is_not_nan()).alias('Keep'))
df_actual = df.filter(pl.col("Keep") is True)

print("df\n", df)
print("df_expect\n", df_expect)
print("df_actual\n", df_actual)

df
 shape: (7, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ NaN │
│ 0.0 ┆ NaN ┆ 0.0 │
│ 0.0 ┆ NaN ┆ NaN │
│ NaN ┆ 0.0 ┆ 0.0 │
│ NaN ┆ 0.0 ┆ NaN │
│ NaN ┆ NaN ┆ NaN │
└─────┴─────┴─────┘

df_expect
 shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ NaN ┆ 0.0 │
│ NaN ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 │
└─────┴─────┴─────┘
df_actual
 shape: (0, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
└─────┴─────┴─────┘

I expected either a or b would be 0.0 (not NaN) and c would always be 0.0. The Polars documentation said to use | as "or" and & as "and". I believe I have the logic right: (((a not Nan) or (b not Nan)) and (c not NaN)) However, the output is wrong.

How do I write a query like (A or B) and C in Python Polars?

import polars as pl
import numpy as np

df = pl.DataFrame(
    data={
        "a": [0.0, 0.0,    0.0,    0.0,    np.nan, np.nan, np.nan],
        "b": [0.0, 0.0,    np.nan, np.nan, 0.0,    0.0,    np.nan],
        "c": [0.0, np.nan, 0.0,    np.nan, 0.0,    np.nan, np.nan]
    }
)

df.with_columns(
    ((pl.col('a').is_not_nan() | pl.col('b').is_not_nan())
     & pl.col('c').is_not_nan()).alias('Keep'))
df_actual = df.filter(pl.col("Keep") is True)

print("df\n", df)
print("df_expect\n", df_expect)
print("df_actual\n", df_actual)

df
 shape: (7, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ NaN │
│ 0.0 ┆ NaN ┆ 0.0 │
│ 0.0 ┆ NaN ┆ NaN │
│ NaN ┆ 0.0 ┆ 0.0 │
│ NaN ┆ 0.0 ┆ NaN │
│ NaN ┆ NaN ┆ NaN │
└─────┴─────┴─────┘

df_expect
 shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ NaN ┆ 0.0 │
│ NaN ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 │
└─────┴─────┴─────┘
df_actual
 shape: (0, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
└─────┴─────┴─────┘

I expected either a or b would be 0.0 (not NaN) and c would always be 0.0. The Polars documentation said to use | as "or" and & as "and". I believe I have the logic right: (((a not Nan) or (b not Nan)) and (c not NaN)) However, the output is wrong.

Share Improve this question edited Feb 23 at 0:15 jqurious 21.6k4 gold badges20 silver badges39 bronze badges asked Feb 22 at 23:14 Steve MaguireSteve Maguire 4221 gold badge5 silver badges13 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 5

The logic looks fine.

One issue is that Polars operations are not "in-place". (apart from some niche methods)

.with_columns() returns a new frame - which you are not using.

Another issue is the usage of is with Expr objects.

>>> type(pl.col("Keep"))
polars.expr.expr.Expr
>>> pl.col("Keep") is True
False

You end up running .filter(False) - hence the result of 0 rows.

If you add the column:

df_actual = df.with_columns(
    ((pl.col("a").is_not_nan() | pl.col("b").is_not_nan())
     & pl.col("c").is_not_nan()).alias("Keep")
)

You can just pass the name (or pl.col) directly.

df_actual = df_actual.filter("Keep")

You could also chain the calls e.g. df.with_columns().filter()

Or you can filter the predicates directly.

df_actual = df.filter(
    (pl.col("a").is_not_nan() | pl.col("b").is_not_nan())
     & pl.col("a").is_not_nan()
)
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ NaN ┆ 0.0 │
│ NaN ┆ 0.0 ┆ 0.0 │
└─────┴─────┴─────┘

本文标签: How do I write a query like (A or B) and C in Python PolarsStack Overflow