python - Choose rows from pandas dataframe based on a condition for many columns - Stack Overflow

IT技术

更新时间：2025-03-110

admin管理员组
文章数量:1296922

I have a pandas DataFrame df that has very many columns, including some named "S-xx" with xx ranging from 1 to 20. All these 20 columns contain labels; let's say they're A,B,C and N. What I want to do is remove all those rows of df that contain label N in any of the S-xx columns. A tiny example:

import pandas as pd

data = {"Subject": ["101", "102", "201", "202"],
        "S-1": [A, N, N, B],
        "S-2": [B, A, N, B],
        "S-3": [A, C, B, N], ... "S-20": [C, A, N, N]}

df = pd.DataFrame(data)
df.set_index("Subject")

Which looks something like this when tabulated:

         S-1  S-2  S-3 ... S-20
Subject            
101       A    B    A  ...  C
102       N    A    C  ...  A
201       N    N    B  ...  N
202       B    B    N  ...  N

I would like to only keep rows in which none of the columns S-x have value N.

Of course I can write the usual df[df["S-1"]!=N & ... ] but since I have many S-x columns, I wonder if there exists a better, more elegant pandas way of doing the same condition on all columns with name S-x and then gathering the results.

I have a pandas DataFrame df that has very many columns, including some named "S-xx" with xx ranging from 1 to 20. All these 20 columns contain labels; let's say they're A,B,C and N. What I want to do is remove all those rows of df that contain label N in any of the S-xx columns. A tiny example:

import pandas as pd

data = {"Subject": ["101", "102", "201", "202"],
        "S-1": [A, N, N, B],
        "S-2": [B, A, N, B],
        "S-3": [A, C, B, N], ... "S-20": [C, A, N, N]}

df = pd.DataFrame(data)
df.set_index("Subject")

Which looks something like this when tabulated:

         S-1  S-2  S-3 ... S-20
Subject            
101       A    B    A  ...  C
102       N    A    C  ...  A
201       N    N    B  ...  N
202       B    B    N  ...  N

I would like to only keep rows in which none of the columns S-x have value N.

Of course I can write the usual df[df["S-1"]!=N & ... ] but since I have many S-x columns, I wonder if there exists a better, more elegant pandas way of doing the same condition on all columns with name S-x and then gathering the results.

Share Improve this question asked Feb 11 at 17:49 Polhek 771 silver badge7 bronze badges

1 Check if the any values are not equal with "N", and after that use all with axis=1 to verify if the rows meets your condition -> df[df.ne('N').all(axis=1)] – Triky Commented Feb 11 at 18:15
This question is similar to: Drop row in pandas dataframe if any value in the row equals zero. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – ouroboros1 Commented Feb 11 at 19:17
As per the suggested duplicate, and combined with [df.filter](), you could do: df[df.filter(like="S-").ne("N").all(axis=1)], or if you need to be really precise, use the regex option, something like: df[df.filter(regex=r'^S-\d+$').ne("N").all(axis=1)]. – ouroboros1 Commented Feb 11 at 19:19
It's different from the suggested duplicate because my dataframe contains much more than just these columns that I need to filter by (which is mentioned in the question but yeah, my tiny example skipped that...). However your second comment answers my question perfectly! It was the filter part that I was missing, thanks! Could you add this as an answer so I can accept it? – Polhek Commented Feb 11 at 19:38

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Select the inverse of what you want to drop:

List all the column names you care about with colnames = [f"S-{i}" for i in range(1,21)] (I've modified this to fit the example data in the below demo)

functools.reduce with operator.or_ handles the fact that any column could have an "N" and the ~ handles the negation of that condition so you only select the rows where no column has an "N".

In [67]: df
Out[67]:
  Subject S-1 S-2 S-3 S-20
0     101   A   B   A    C
1     102   N   A   C    A
2     201   N   N   B    N
3     202   B   B   N    N

In [68]: colnames = ['S-1', 'S-2', 'S-3', 'S-20']

In [69]: df[~functools.reduce(operator.or_, (df[col].eq("N") for col in colnames))]
Out[69]:
  Subject S-1 S-2 S-3 S-20
0     101   A   B   A    C

Don't fet to import fuctools, operator

本文标签： pythonChoose rows from pandas dataframe based on a condition for many columnsStack Overflow

版权声明：本文标题：python - Choose rows from pandas dataframe based on a condition for many columns - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741643322a2390044.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - Choose rows from pandas dataframe based on a condition for many columns - Stack Overflow

1 Answer 1

更多相关文章