admin管理员组文章数量:1391960
I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:
import pandas as pd
import numpy as np
alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]
dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))
df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']
c = np.array( [(1, 100), (2, 202), (6, 772)] )
which gives:
timestamp | item |
---|---|
2025-01-01 00:00:00 | A |
2025-01-01 01:00:00 | B |
2025-01-01 02:00:00 | C |
2025-01-01 03:00:00 | D |
2025-01-01 04:00:00 | E |
2025-01-01 05:00:00 | F |
2025-01-01 06:00:00 | G |
2025-01-01 07:00:00 | H |
2025-01-01 08:00:00 | I |
2025-01-01 09:00:00 | J |
2025-01-01 10:00:00 | K |
I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:
import pandas as pd
import numpy as np
alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]
dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))
df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']
c = np.array( [(1, 100), (2, 202), (6, 772)] )
which gives:
timestamp | item |
---|---|
2025-01-01 00:00:00 | A |
2025-01-01 01:00:00 | B |
2025-01-01 02:00:00 | C |
2025-01-01 03:00:00 | D |
2025-01-01 04:00:00 | E |
2025-01-01 05:00:00 | F |
2025-01-01 06:00:00 | G |
2025-01-01 07:00:00 | H |
2025-01-01 08:00:00 | I |
2025-01-01 09:00:00 | J |
2025-01-01 10:00:00 | K |
I am trying to join column c, in such a way that ROW[1] contains [B and 100].
I have accomplished what I want with the following:
df.reset_index(inplace = True)
df.index.name = 'pos'
for x,y in c:
df.loc[ int(x) , 'price'] = y
df.set_index("timestamp", inplace=True)
This gave me the desired results:
timestamp | item | price |
---|---|---|
2025-01-01 00:00:00 | A | nan |
2025-01-01 01:00:00 | B | 100 |
2025-01-01 02:00:00 | C | 202 |
2025-01-01 03:00:00 | D | nan |
2025-01-01 04:00:00 | E | nan |
2025-01-01 05:00:00 | F | nan |
2025-01-01 06:00:00 | G | 772 |
2025-01-01 07:00:00 | H | nan |
2025-01-01 08:00:00 | I | nan |
2025-01-01 09:00:00 | J | nan |
2025-01-01 10:00:00 | K | nan |
However, the idea of dropping and recreating the index for this feels a bit awkward, especially if I have multiple indexes.
My question, is there a better way that dropping and recreating an index to add a column with missing values, using position ?
Share Improve this question asked Mar 13 at 1:38 MansourMansour 6881 gold badge6 silver badges14 bronze badges2 Answers
Reset to default 3Index.take returns the index of your dataframe based on the position and we can use the first column of your array to get the index.
df.loc[df.index.take(c[:, 0]), 'price'] = c[:, 1]
You can also use a combination of loc and iloc.
df.loc[df.iloc[c[:, 0]].index, 'price'] = c[:, 1]
End result:
item price
2025-01-01 00:00:00 A NaN
2025-01-01 01:00:00 B 100.0
2025-01-01 02:00:00 C 202.0
2025-01-01 03:00:00 D NaN
2025-01-01 04:00:00 E NaN
2025-01-01 05:00:00 F NaN
2025-01-01 06:00:00 G 772.0
2025-01-01 07:00:00 H NaN
2025-01-01 08:00:00 I NaN
2025-01-01 09:00:00 J NaN
2025-01-01 10:00:00 K NaN
You can do it this way by creating a key from hour in dataframe index and join on a dataframe created from the list of tuples.
df.reset_index().assign(key=df.index.hour).merge(pd.DataFrame(c, columns=['key', 'price']), how='left')
Output:
timestamp item key price
0 2025-01-01 00:00:00 A 0 NaN
1 2025-01-01 01:00:00 B 1 100.0
2 2025-01-01 02:00:00 C 2 202.0
3 2025-01-01 03:00:00 D 3 NaN
4 2025-01-01 04:00:00 E 4 NaN
5 2025-01-01 05:00:00 F 5 NaN
6 2025-01-01 06:00:00 G 6 772.0
7 2025-01-01 07:00:00 H 7 NaN
8 2025-01-01 08:00:00 I 8 NaN
9 2025-01-01 09:00:00 J 9 NaN
10 2025-01-01 10:00:00 K 10 NaN
Or even shorted:
df.assign(price=df.index.hour.map(pd.Series(c[:,1], index=c[:,0])))
Output:
item price
timestamp
2025-01-01 00:00:00 A NaN
2025-01-01 01:00:00 B 100.0
2025-01-01 02:00:00 C 202.0
2025-01-01 03:00:00 D NaN
2025-01-01 04:00:00 E NaN
2025-01-01 05:00:00 F NaN
2025-01-01 06:00:00 G 772.0
2025-01-01 07:00:00 H NaN
2025-01-01 08:00:00 I NaN
2025-01-01 09:00:00 J NaN
2025-01-01 10:00:00 K NaN
本文标签: pandasAdd column with missing values by position to timeseriesStack Overflow
版权声明:本文标题:pandas - Add column with missing values by position to timeseries - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744723461a2621833.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论