admin管理员组文章数量:1332339
Starting from this dataframe
df = pd.DataFrame(
np.arange(3*4).reshape((4, 3)),
index=['a', 'b', 'c', 'd'],
columns=['A', 'B', 'C']
)
print(df)
A B C
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
I want to apply two functions to each column to generate two columns for each original column to obtain this shape, with a multiindex column nested below each original column:
A B C
x y x y x y
a 10 100 11 101 12 102
b 13 103 14 104 15 105
c 16 106 17 107 18 108
d 19 109 20 110 21 111
however, something like this doesn't work
df.apply(lambda series:
series.transform([lambda x: x+10, lambda x: x+100])
)
and raises ValueError: If using all scalar values, you must pass an index
Note that I do not want to use agg like in this answer, since this is not an aggregation. I also want to avoid referring to column names directly.
Starting from this dataframe
df = pd.DataFrame(
np.arange(3*4).reshape((4, 3)),
index=['a', 'b', 'c', 'd'],
columns=['A', 'B', 'C']
)
print(df)
A B C
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
I want to apply two functions to each column to generate two columns for each original column to obtain this shape, with a multiindex column nested below each original column:
A B C
x y x y x y
a 10 100 11 101 12 102
b 13 103 14 104 15 105
c 16 106 17 107 18 108
d 19 109 20 110 21 111
however, something like this doesn't work
df.apply(lambda series:
series.transform([lambda x: x+10, lambda x: x+100])
)
and raises ValueError: If using all scalar values, you must pass an index
Note that I do not want to use agg like in this answer, since this is not an aggregation. I also want to avoid referring to column names directly.
Share Improve this question edited Nov 20, 2024 at 20:10 wjandrea 33.2k10 gold badges69 silver badges98 bronze badges asked Nov 20, 2024 at 19:56 goweongoweon 1,37414 silver badges21 bronze badges2 Answers
Reset to default 4You just need to use df.transform()
and give your functions names.
def x(k):
return k + 10
def y(k):
return k + 100
df.transform([x, y])
A B C
x y x y x y
a 10 100 11 101 12 102
b 13 103 14 104 15 105
c 16 106 17 107 18 108
d 19 109 20 110 21 111
SOLUTION 1
A possible solution, whose steps are:
First, it creates two new dataframes: one that adds 10 to each element, and another one that adds 100 to each element.
Then, it concatenates these dataframes along the columns using
pd.concat
withaxis=1
and assigns keys['x', 'y']
to create a hierarchical column index.The method
swaplevel
is applied to swap the levels of the columnMultiIndex
, followed bysort_index
to sort the columns.
(pd.concat([df + 10, df + 100], axis=1, keys=['x', 'y'])
.swaplevel(axis=1).sort_index(axis=1))
SOLUTION 2
Another possible solution, whose steps are:
It first creates two new dataframes: one where 10 is added to each element (
df + 10
) and another where 100 is added (df + 100
).These two dataframes are combined into a 3D
numpy
array usingstack
withaxis=2
, resulting in an array where the third dimension stacks the two transformations.The array is then reshaped into a two-dimensional array with the same number of rows as
df
.A new dataframe is created from this reshaped array, with columns assigned a hierarchical index using
pd.MultiIndex.from_product
.
pd.DataFrame(
np.stack([df + 10, df + 100], axis=2).reshape(df.shape[0], -1),
columns=pd.MultiIndex.from_product([df.columns, ['x', 'y']]))
Output:
A B C
x y x y x y
0 10 100 11 101 12 102
1 13 103 14 104 15 105
2 16 106 17 107 18 108
3 19 109 20 110 21 111
本文标签: pythonpandas apply multiple columnsStack Overflow
版权声明:本文标题:python - pandas apply multiple columns - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742332999a2455066.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论