admin管理员组

文章数量:1122832

I'm looking for a way to drop columns that contain a specific word but without using loops, even if my present Dataframe doesn't have a lot of columns, I know that there is a lot of methods that avoid using loop and realize the exact same thing but incredibly faster. (as the vectorization to create new_column from existing ones)

I want to learn doing things properly.

for col in df_web.columns :
    if 'post' in col and col != 'post_title':
        df_web.drop(columns=col, inplace = True)

I also could use list comprehension but that still use for loop :

my_col = [col for col in df_web.columns if not col.startswith("post") or col == 'post_title']
df_web = df_web.loc[:, my_col]

Here is the original list of columns of my dataframe :

['sku', 'total_sales', 'post_author', 'post_date', 'post_date_gmt', 'product_type', 'post_title', 'post_excerpt', 'post_name', 'post_modified', 'post_modified_gmt', 'guid', 'post_type']

I'm looking for a way to drop columns that contain a specific word but without using loops, even if my present Dataframe doesn't have a lot of columns, I know that there is a lot of methods that avoid using loop and realize the exact same thing but incredibly faster. (as the vectorization to create new_column from existing ones)

I want to learn doing things properly.

for col in df_web.columns :
    if 'post' in col and col != 'post_title':
        df_web.drop(columns=col, inplace = True)

I also could use list comprehension but that still use for loop :

my_col = [col for col in df_web.columns if not col.startswith("post") or col == 'post_title']
df_web = df_web.loc[:, my_col]

Here is the original list of columns of my dataframe :

['sku', 'total_sales', 'post_author', 'post_date', 'post_date_gmt', 'product_type', 'post_title', 'post_excerpt', 'post_name', 'post_modified', 'post_modified_gmt', 'guid', 'post_type']

Share Improve this question edited yesterday thalback asked yesterday thalbackthalback 534 bronze badges 2
  • 1 read notice: minimal reproducible example – Panda Kim Commented yesterday
  • Understood. As soon as I post something that contain a problem or an error to reproduce, I will also post the code to reproduce the problem itself. – thalback Commented yesterday
Add a comment  | 

1 Answer 1

Reset to default 2

You could create a mask for drop:

m = df_web.columns.str.startswith('post_') & (df_web.columns!='post_title')
# array([ True,  True, False, False, False,  True,  True, False, False,
#        False, False,  True, False])

df_web.drop(columns=df_web.columns[m], inplace=True)

Or for boolean indexing:

df_web = df_web.loc[:, (df_web.columns=='post_title')
                       | ~df_web.columns.str.startswith('post_')]

Or using filter with a (nested) negative match:

df_web = df_web.filter(regex='^(?!post_(?!title))')

regex demo

Output columns:

Index(['sku', 'total_sales', 'product_type', 'post_title', 'guid'], dtype='object')

本文标签: pythonDeletion of columns that contain a specific word in the column nameStack Overflow