admin管理员组文章数量:1296922
I have a pandas
DataFrame
df
that has very many columns, including some named "S-xx" with xx ranging from 1 to 20. All these 20 columns contain labels; let's say they're A,B,C and N. What I want to do is remove all those rows of df
that contain label N in any of the S-xx columns. A tiny example:
import pandas as pd
data = {"Subject": ["101", "102", "201", "202"],
"S-1": [A, N, N, B],
"S-2": [B, A, N, B],
"S-3": [A, C, B, N], ... "S-20": [C, A, N, N]}
df = pd.DataFrame(data)
df.set_index("Subject")
Which looks something like this when tabulated:
S-1 S-2 S-3 ... S-20
Subject
101 A B A ... C
102 N A C ... A
201 N N B ... N
202 B B N ... N
I would like to only keep rows in which none of the columns S-x
have value N.
Of course I can write the usual df[df["S-1"]!=N & ... ]
but since I have many S-x
columns, I wonder if there exists a better, more elegant pandas
way of doing the same condition on all columns with name S-x
and then gathering the results.
I have a pandas
DataFrame
df
that has very many columns, including some named "S-xx" with xx ranging from 1 to 20. All these 20 columns contain labels; let's say they're A,B,C and N. What I want to do is remove all those rows of df
that contain label N in any of the S-xx columns. A tiny example:
import pandas as pd
data = {"Subject": ["101", "102", "201", "202"],
"S-1": [A, N, N, B],
"S-2": [B, A, N, B],
"S-3": [A, C, B, N], ... "S-20": [C, A, N, N]}
df = pd.DataFrame(data)
df.set_index("Subject")
Which looks something like this when tabulated:
S-1 S-2 S-3 ... S-20
Subject
101 A B A ... C
102 N A C ... A
201 N N B ... N
202 B B N ... N
I would like to only keep rows in which none of the columns S-x
have value N.
Of course I can write the usual df[df["S-1"]!=N & ... ]
but since I have many S-x
columns, I wonder if there exists a better, more elegant pandas
way of doing the same condition on all columns with name S-x
and then gathering the results.
1 Answer
Reset to default 0Select the inverse of what you want to drop:
List all the column names you care about with colnames = [f"S-{i}" for i in range(1,21)]
(I've modified this to fit the example data in the below demo)
functools.reduce
with operator.or_
handles the fact that any column could have an "N" and the ~
handles the negation of that condition so you only select the rows where no column has an "N".
In [67]: df
Out[67]:
Subject S-1 S-2 S-3 S-20
0 101 A B A C
1 102 N A C A
2 201 N N B N
3 202 B B N N
In [68]: colnames = ['S-1', 'S-2', 'S-3', 'S-20']
In [69]: df[~functools.reduce(operator.or_, (df[col].eq("N") for col in colnames))]
Out[69]:
Subject S-1 S-2 S-3 S-20
0 101 A B A C
Don't fet to import fuctools, operator
本文标签: pythonChoose rows from pandas dataframe based on a condition for many columnsStack Overflow
版权声明:本文标题:python - Choose rows from pandas dataframe based on a condition for many columns - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741643322a2390044.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
df.filter
](), you could do:df[df.filter(like="S-").ne("N").all(axis=1)]
, or if you need to be really precise, use theregex
option, something like:df[df.filter(regex=r'^S-\d+$').ne("N").all(axis=1)]
. – ouroboros1 Commented Feb 11 at 19:19