首页 IT技术正文内容

python - How does pd.where work with callables as parameters? - Stack Overflow

IT技术

更新时间：2025-03-091

admin管理员组
文章数量:1290098

The basics of using Pandas where with callables seems simple.

np.random.seed(0)
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df["test"] = range(1,9)

def MyBool(x):
    print(1)
    return ( x > 0 )

def MyFunc(x1):
    print(1)
    return x1['A']

df.where(
    cond  = lambda x: MyBool(x),
    other = lambda x: MyFunc(x) ,
    )

In the code above, I am replacing the values of all columns with the value of column A whenever the value of the col is less than 0. Note, I know I don't need to use callables for this simple example.

Based on my analysis, this is what is happening under the hood.

First, MyFunc is evaluated where the argument is the df itself. This returns a 8x1 df (=A)

Second, the MyBool is evaluated which returns a 8x5 boolean df.

Third, (not sure about this last step) for all entries (i,j) where MyBool returned False, the value of the i'th row of the output of MyFunc is used to replace the current value of the df.

This leads me on to my question: how does this extend to the cases when MyFunc returns a dataframe with several columns and rows? How does the function determine which entries need to be replaced and with which values?

For illustrative purposes, suppose now that we want to divide B and C by 2 when test is equal to 5. The code I have provided below works but I don't quite understand how it determines which entries are to be replaced and with which values.

MyBool still returns a one dimensional vector but MyFunc returns a dataframe. If the previous logic I explained was correct, then shouldn't it replace each False entry with the dataframe? Indeed, if this were the case, the resulting dataframe should be bigger than the input df.

I've been reading the documentation and playing with different examples but can't figure this one out.

def MyBool(x):
    output = x.test != 5
    return output

def MyFunc(x1):
    x1.loc[ x1.test == 5, ["B", "C"] ] /= 2
    return x1

df.where(
    cond  = lambda x: MyBool(x),
    other = lambda x: MyFunc(x.copy()),
    axis  = 0
    )

The basics of using Pandas where with callables seems simple.

np.random.seed(0)
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df["test"] = range(1,9)

def MyBool(x):
    print(1)
    return ( x > 0 )

def MyFunc(x1):
    print(1)
    return x1['A']

df.where(
    cond  = lambda x: MyBool(x),
    other = lambda x: MyFunc(x) ,
    )

In the code above, I am replacing the values of all columns with the value of column A whenever the value of the col is less than 0. Note, I know I don't need to use callables for this simple example.

Based on my analysis, this is what is happening under the hood.

First, MyFunc is evaluated where the argument is the df itself. This returns a 8x1 df (=A)

Second, the MyBool is evaluated which returns a 8x5 boolean df.

Third, (not sure about this last step) for all entries (i,j) where MyBool returned False, the value of the i'th row of the output of MyFunc is used to replace the current value of the df.

This leads me on to my question: how does this extend to the cases when MyFunc returns a dataframe with several columns and rows? How does the function determine which entries need to be replaced and with which values?

For illustrative purposes, suppose now that we want to divide B and C by 2 when test is equal to 5. The code I have provided below works but I don't quite understand how it determines which entries are to be replaced and with which values.

MyBool still returns a one dimensional vector but MyFunc returns a dataframe. If the previous logic I explained was correct, then shouldn't it replace each False entry with the dataframe? Indeed, if this were the case, the resulting dataframe should be bigger than the input df.

I've been reading the documentation and playing with different examples but can't figure this one out.

def MyBool(x):
    output = x.test != 5
    return output

def MyFunc(x1):
    x1.loc[ x1.test == 5, ["B", "C"] ] /= 2
    return x1

df.where(
    cond  = lambda x: MyBool(x),
    other = lambda x: MyFunc(x.copy()),
    axis  = 0
    )

Share Improve this question edited Feb 19 at 18:56 asked Feb 19 at 17:14 Leksa99 936 bronze badges

2 It would be great to provide a reproducible example (seed the random numbers with np.random.seed(0)), and the matching expected output, for clarity. – mozway Commented Feb 19 at 17:38

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

The logic is quite simple, for each False in the output of the cond callable, the matching value in the result of other will be used as replacement. If other is a scalar, this value is used.

The matching value is identified by position if the callable returns an array, and by alignment for a DataFrame:

df = pd.DataFrame({'A': [1, 0], 'B': [0, 1]})
df.where(cond=lambda x: x==0,
         other=lambda x: pd.DataFrame({'A': [10, 20]}, index=[1, 0]))
#     A    B
# 0  20  0.0
# 1   0  NaN

df.where(cond=lambda x: x==0, other=lambda x: [[7,8],[9,10]])
#    A   B
# 0  7   0
# 1  0  10

Therefore, your MyFunc functions should return a scalar, a DataFrame (that will be aligned), or an array of the same shape as the input.

You can modify it to broadcast the values to all columns:

def MyFunc(x1):
    print(1)
    return np.broadcast_to(x1['A'].to_frame().values, df.shape)

Example:

df = pd.DataFrame([[1, -1,  0,  0],
                   [2,  0, -1,  0],
                   [3,  0,  0, -1]],
                  columns=['A', 'B', 'C', 'D'])
def MyBool(x):
    return x >= 0

def MyFunc(x):
    return np.broadcast_to(x['A'].to_frame().values, df.shape)

out = df.where(cond=MyBool, other=MyFunc)

#    A  B  C  D
# 0  1  1  0  0
# 1  2  0  2  0
# 2  3  0  0  3

Note that the callables should NOT modify the DataFrame in place.

This should be avoided:

def MyFunc(x1):
    x1.loc[ x1.test == 5, ["B", "C"] ] /= 2
    return x1

and could be replaced by a simple (without using where):

df.loc[df['test'] == 5, ['B', 'C']] /= 2

本文标签： pythonHow does pdwhere work with callables as parametersStack Overflow

版权声明：本文标题：python - How does pd.where work with callables as parameters? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741481213a2381184.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

python - How does pd.where work with callables as parameters? - Stack Overflow

IT技术

14小时前

The basics of using Pandas where with callables seems simple.np.random.seed(0)df = pd.DataFrame(np.ra

发表评论

全部评论 0

暂无评论

推荐文章

jquery - How to access files attribute of a file element using JavaScript - Stack Overflow

python - Handling Occasional 100 MB API Responses in FastAPI: Polars vs. NumPyPandas? - Stack Overflow

Can I convert password to md5 in javascript before sending to php page? - Stack Overflow

python - Why I'm having an error on my robotframework automation - Stack Overflow

javascript - vue.js filters in v-for - Stack Overflow

热门文章

最新文章