admin管理员组文章数量:1305482
I have a Pandas dataframe:
import pandas as pd
import numpy as np
np.random.seed(150)
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=['A', 'B'])
I want to add a new column "C" whose values are the combined-list of every three rows in column "B". So I use the following method to achieve my needs, but this method is slow when the data is large.
>>> df['C'] = [df['B'].iloc[i-2:i+1].tolist() if i >= 2 else None for i in range(len(df))]
>>> df
A B C
0 4 9 None
1 0 2 None
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]
When I try to use the df.apply function, I get an error message:
df['C'] = df['B'].rolling(window=3).apply(lambda x: list(x), raw=False)
TypeError: must be real number, not list
I remember that Pandas apply
doesn't seem to return a list, so how do I do this? I searched the forum, but couldn't find a suitable topic about apply and return.
I have a Pandas dataframe:
import pandas as pd
import numpy as np
np.random.seed(150)
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=['A', 'B'])
I want to add a new column "C" whose values are the combined-list of every three rows in column "B". So I use the following method to achieve my needs, but this method is slow when the data is large.
>>> df['C'] = [df['B'].iloc[i-2:i+1].tolist() if i >= 2 else None for i in range(len(df))]
>>> df
A B C
0 4 9 None
1 0 2 None
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]
When I try to use the df.apply function, I get an error message:
df['C'] = df['B'].rolling(window=3).apply(lambda x: list(x), raw=False)
TypeError: must be real number, not list
I remember that Pandas apply
doesn't seem to return a list, so how do I do this? I searched the forum, but couldn't find a suitable topic about apply and return.
4 Answers
Reset to default 8You can use numpy
's sliding_window_view
:
from numpy.lib.stride_tricks import sliding_window_view as swv
N = 3
df['C'] = pd.Series(swv(df['B'], N).tolist(), index=df.index[N-1:])
Output:
A B C
0 4 9 NaN
1 0 2 NaN
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]
I guess you can change your thinking from another way around, say, not row-wise
but column-wise
sliding windowing, and probably your code could speed up unless you have a large window size N
.
For example, you can try
N = 3
nr = len(df)
df['C'] = [None]*(N-1) + np.column_stack([df['B'].iloc[k:nr-N+1+k] for k in range(N)]).tolist()
and you will obtain
A B C
0 4 9 None
1 0 2 None
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]
The code slices out the 'B' column of a DataFrame, then forms windows of size three over it. Each sliding window is stored in a list format under a new column ‘C’. The first two rows of ‘C’ have None because the first two elements do not have enough preceding elements to form a window. This process is made easier by the function sliding_window_view, which avoids copying data and instead creates views of the original array.
import pandas as pd
import numpy as np
# Use sliding_window_view for fast rolling window extraction
from numpy.lib.stride_tricks import sliding_window_view
# Sample Data
np.random.seed(150)
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=['A', 'B'])
print(df)
'''
A B
0 4 9
1 0 2
2 4 5
3 7 9
4 8 3
5 8 1
6 1 4
7 4 1
8 1 9
9 3 7
'''
# Convert column to NumPy array
B_values = df['B'].values
'''
Apply sliding window
Imagine a window of size 3 sliding across the array.
For each position of the window, it extracts the elements
within the window.
'''
windows = sliding_window_view(B_values, window_shape=3)
# Create a new column, filling the first two rows with None
df['C'] = [None, None] + windows.tolist()
print(df.head(10))
'''
A B C
0 4 9 None
1 0 2 None
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]
'''
Here is another way:
df.assign(C = [s.tolist() if len(s) == 3 else None for s in df['B'].rolling(3)])
Output:
A B C
0 4 9 None
1 0 2 None
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]
本文标签: pythonHow to use the apply function to return a list to new column in PandasStack Overflow
版权声明:本文标题:python - How to use the apply function to return a list to new column in Pandas - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741777023a2397086.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论