admin管理员组

文章数量:1122832

I'm trying to create a new column in my dataframe which has a 1 if there is a match with some keywords, and 0 if not.

I can currently loop over the keywords as such:

import pandas as pd

# Sample DataFrame
data = {'column1': ['sp123', 'abc', 'sp456', 'def'], 'column2': ['tp1234', 'abc', 'sp4256', 'def'], 'column3': ['syp123', 'abc', 'sp456', 'def']}
df = pd.DataFrame(data)

# List of keywords
keywords = ['sp', 'xyz']

# Add a new column 'flag' and set it to 1 if any keyword is in 'column1'
df['flag'] = df['column1'].apply(lambda x: 1 if any(keyword in x for keyword in keywords) else 0)

print(df)

What I would like to do is be able to loop over all columns as well so something like this:

for col in list(data.keys()):
    df['flag'] = df[col].apply(lambda x: 1 if any(keyword in x for keyword in keywords) else 0)

I tried this and all the values of flag are 0,0,0,0. Where they should be 1,0,1,0.

I'm trying to create a new column in my dataframe which has a 1 if there is a match with some keywords, and 0 if not.

I can currently loop over the keywords as such:

import pandas as pd

# Sample DataFrame
data = {'column1': ['sp123', 'abc', 'sp456', 'def'], 'column2': ['tp1234', 'abc', 'sp4256', 'def'], 'column3': ['syp123', 'abc', 'sp456', 'def']}
df = pd.DataFrame(data)

# List of keywords
keywords = ['sp', 'xyz']

# Add a new column 'flag' and set it to 1 if any keyword is in 'column1'
df['flag'] = df['column1'].apply(lambda x: 1 if any(keyword in x for keyword in keywords) else 0)

print(df)

What I would like to do is be able to loop over all columns as well so something like this:

for col in list(data.keys()):
    df['flag'] = df[col].apply(lambda x: 1 if any(keyword in x for keyword in keywords) else 0)

I tried this and all the values of flag are 0,0,0,0. Where they should be 1,0,1,0.

Share Improve this question edited Nov 22, 2024 at 15:08 Mark Rotteveel 109k224 gold badges155 silver badges218 bronze badges asked Nov 22, 2024 at 15:06 TIC-FLYTIC-FLY 1731 silver badge10 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

You can use the apply function with axis=1 to evaluate each row across all its columns. For every row, check if keyword is present in any column value.

df['flag'] = df.apply(lambda row: 1 if any(keyword in str(row[col]) for col in df.columns for keyword in keywords) else 0, axis=1)

for col in df.columns: Loops through all the columns in the dataframe for a given row.

for keyword in keywords: Iterates through the list of keywords.

any(keyword in str(row[col]) for ...): Checks if any keyword is present in the current column's value. The str(row[col]) ensures compatibility with non-string data types.

Test Cases with your given data and keywords:

  column1 column2 column3  flag
0   sp123  tp1234  syp123     1
1     abc     abc     abc     0
2   sp456  sp4256   sp456     1
3     def     def     def     0

本文标签: pythonDynamic dataframeapplyStack Overflow