admin管理员组

文章数量:1295268

What's the fastest python way to create a new column for the sum of the following:

In my dataframe, there are unknown number of columns that are named like carbrands_0_type, carbrands_1_type, etc. There is only one string value in each such column, e.g. "BMW", and there are corresponding float values that relates to the type value, so there are columns also named as carbrands_0_quantity, carbrands_1_quantity, etc. So if carbrands_0_type is "BMW", carbrands_0_quantity is 50, i would know for that row or event, i have 50 BMW.

The thing is the car brands will not appear in any defined column, and can be random, so BMW can be seen in carbrands_15_type, carbrands_15_quantity for the next row.

Tentatively, I would need to get the str name "Audi" and create a new column named 'Audi', with the corresponding quantity for the entire dataframe. What I have done is the following:

def convert_sum_type_quantity(row, df, start_string, end_str1, end_str2, character):

total_sum = 0
val = len([x for x in df.columns if (x.startswith(start_string) & x.endswith(end_str1))]) # in this case this is using type for len function

for i in range(val):
    qnty_col = start_string + '_' + str(i) + '_' + end_str2
    type_col = start_string + '_' + str(i) + '_' + end_str1
    
    if ((isinstance(row[type_col], str)) & (character in str(row[type_col]))):
        total_sum += row[qnty_col]
    
return total_sum

then after which i apply to the dataframe.

data['audi'] = data.apply(lambda row: convert_sum_type_quantity(row, data, 'carbrands', 'type', 'quantity', 'audi'), axis=1)

it works but it is draggy and slow, since it is using lambda. Moreover, it take more time if i want more columns, like BMW or Mercedes.

Any experts with good advice? Or even a better code to get all unique car brands with the corresponding quantity?

p.s. i need the extra defined columns for output to non-IT people

本文标签: pandasFastest Python Approach to Get Aggregated ValueStack Overflow