admin管理员组

文章数量:1296922

I have a pandas DataFrame df that has very many columns, including some named "S-xx" with xx ranging from 1 to 20. All these 20 columns contain labels; let's say they're A,B,C and N. What I want to do is remove all those rows of df that contain label N in any of the S-xx columns. A tiny example:

import pandas as pd

data = {"Subject": ["101", "102", "201", "202"],
        "S-1": [A, N, N, B],
        "S-2": [B, A, N, B],
        "S-3": [A, C, B, N], ... "S-20": [C, A, N, N]}

df = pd.DataFrame(data)
df.set_index("Subject")

Which looks something like this when tabulated:

         S-1  S-2  S-3 ... S-20
Subject            
101       A    B    A  ...  C
102       N    A    C  ...  A
201       N    N    B  ...  N
202       B    B    N  ...  N

I would like to only keep rows in which none of the columns S-x have value N.

Of course I can write the usual df[df["S-1"]!=N & ... ] but since I have many S-x columns, I wonder if there exists a better, more elegant pandas way of doing the same condition on all columns with name S-x and then gathering the results.

I have a pandas DataFrame df that has very many columns, including some named "S-xx" with xx ranging from 1 to 20. All these 20 columns contain labels; let's say they're A,B,C and N. What I want to do is remove all those rows of df that contain label N in any of the S-xx columns. A tiny example:

import pandas as pd

data = {"Subject": ["101", "102", "201", "202"],
        "S-1": [A, N, N, B],
        "S-2": [B, A, N, B],
        "S-3": [A, C, B, N], ... "S-20": [C, A, N, N]}

df = pd.DataFrame(data)
df.set_index("Subject")

Which looks something like this when tabulated:

         S-1  S-2  S-3 ... S-20
Subject            
101       A    B    A  ...  C
102       N    A    C  ...  A
201       N    N    B  ...  N
202       B    B    N  ...  N

I would like to only keep rows in which none of the columns S-x have value N.

Of course I can write the usual df[df["S-1"]!=N & ... ] but since I have many S-x columns, I wonder if there exists a better, more elegant pandas way of doing the same condition on all columns with name S-x and then gathering the results.

Share Improve this question asked Feb 11 at 17:49 PolhekPolhek 771 silver badge7 bronze badges 4
  • 1 Check if the any values are not equal with "N", and after that use all with axis=1 to verify if the rows meets your condition -> df[df.ne('N').all(axis=1)] – Triky Commented Feb 11 at 18:15
  • This question is similar to: Drop row in pandas dataframe if any value in the row equals zero. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – ouroboros1 Commented Feb 11 at 19:17
  • As per the suggested duplicate, and combined with [df.filter](), you could do: df[df.filter(like="S-").ne("N").all(axis=1)], or if you need to be really precise, use the regex option, something like: df[df.filter(regex=r'^S-\d+$').ne("N").all(axis=1)]. – ouroboros1 Commented Feb 11 at 19:19
  • It's different from the suggested duplicate because my dataframe contains much more than just these columns that I need to filter by (which is mentioned in the question but yeah, my tiny example skipped that...). However your second comment answers my question perfectly! It was the filter part that I was missing, thanks! Could you add this as an answer so I can accept it? – Polhek Commented Feb 11 at 19:38
Add a comment  | 

1 Answer 1

Reset to default 0

Select the inverse of what you want to drop:

List all the column names you care about with colnames = [f"S-{i}" for i in range(1,21)] (I've modified this to fit the example data in the below demo)

functools.reduce with operator.or_ handles the fact that any column could have an "N" and the ~ handles the negation of that condition so you only select the rows where no column has an "N".

In [67]: df
Out[67]:
  Subject S-1 S-2 S-3 S-20
0     101   A   B   A    C
1     102   N   A   C    A
2     201   N   N   B    N
3     202   B   B   N    N

In [68]: colnames = ['S-1', 'S-2', 'S-3', 'S-20']

In [69]: df[~functools.reduce(operator.or_, (df[col].eq("N") for col in colnames))]
Out[69]:
  Subject S-1 S-2 S-3 S-20
0     101   A   B   A    C

Don't fet to import fuctools, operator

本文标签: pythonChoose rows from pandas dataframe based on a condition for many columnsStack Overflow