admin管理员组文章数量:1122846
Here is my sample data:
indicator1
company date
company1 2015-01-01 97.0
2016-01-01 55.0
2017-01-01 47.0
2018-01-01 68.0
2019-01-01 65.0
company2 2015-01-01 22.0
2016-01-01 40.0
2017-01-01 22.0
2018-01-01 12.0
2019-01-01 86.0
company3 2015-01-01 47.0
2016-01-01 28.0
2017-01-01 91.0
2018-01-01 63.0
2018-05-01 123.0
2019-01-01 57.0
I'm trying to calculate 1-year pct_chng this way:
df["pct_chng_3"] = df.groupby("company", group_keys=False)\
.apply(lambda x: x['indicator1'].pct_change(periods = period, freq = 'Y'))
It works fine w/o the freq parameter (just does pct_change line-by_line), but as soon as I add freq = 'Y' I'm getting the error:
new_ax = index.shift(periods, freq)
NotImplementedError: This method is only implemented for DatetimeIndex, PeriodIndex and TimedeltaIndex; Got type MultiIndex
I presume that is caused by the fact that groupBy leaves the two-dimensional index in place, that confuses the "shift" method.
I can't figure out a nice workaround.
Here is my sample data:
indicator1
company date
company1 2015-01-01 97.0
2016-01-01 55.0
2017-01-01 47.0
2018-01-01 68.0
2019-01-01 65.0
company2 2015-01-01 22.0
2016-01-01 40.0
2017-01-01 22.0
2018-01-01 12.0
2019-01-01 86.0
company3 2015-01-01 47.0
2016-01-01 28.0
2017-01-01 91.0
2018-01-01 63.0
2018-05-01 123.0
2019-01-01 57.0
I'm trying to calculate 1-year pct_chng this way:
df["pct_chng_3"] = df.groupby("company", group_keys=False)\
.apply(lambda x: x['indicator1'].pct_change(periods = period, freq = 'Y'))
It works fine w/o the freq parameter (just does pct_change line-by_line), but as soon as I add freq = 'Y' I'm getting the error:
new_ax = index.shift(periods, freq)
NotImplementedError: This method is only implemented for DatetimeIndex, PeriodIndex and TimedeltaIndex; Got type MultiIndex
I presume that is caused by the fact that groupBy leaves the two-dimensional index in place, that confuses the "shift" method.
I can't figure out a nice workaround.
Share Improve this question edited Nov 21, 2024 at 11:53 Mark Rotteveel 109k224 gold badges155 silver badges218 bronze badges asked Nov 21, 2024 at 11:36 ArseniArseni 336 bronze badges1 Answer
Reset to default 2Use DateOffset
for specify frequency, for avoid your error convert first level company
to column by Series.reset_index
, count pct_change
and again recreate MultiIndex
:
df1 = df.reset_index(level=0)
out = (df.join(df1.groupby("company", group_keys=False, sort=False)['indicator1']
.pct_change(freq=pd.DateOffset(years=1))
.to_frame('pct_chng_3')
.set_index(df1['company'], append=True).swaplevel()))
print (out)
indicator1 pct_chng_3
company date
company1 2015-01-01 97.0 NaN
2016-01-01 55.0 -0.432990
2017-01-01 47.0 -0.145455
2018-01-01 68.0 0.446809
2019-01-01 65.0 -0.044118
company2 2015-01-01 22.0 NaN
2016-01-01 40.0 0.818182
2017-01-01 22.0 -0.450000
2018-01-01 12.0 -0.454545
2019-01-01 86.0 6.166667
company3 2015-01-01 47.0 NaN
2016-01-01 28.0 -0.404255
2017-01-01 91.0 2.250000
2018-01-01 63.0 -0.307692
2018-05-01 123.0 NaN
2019-01-01 57.0 -0.095238
Another idea without MultiIndex
output is create numpy array, in my opinion less safe:
df['pct_chng_3'] = (df.reset_index(level=0)
.groupby("company", group_keys=False, sort=False)['indicator1']
.pct_change(freq=pd.DateOffset(years=1)).to_numpy())
print (df)
indicator1 pct_chng_3
company date
company1 2015-01-01 97.0 NaN
2016-01-01 55.0 -0.432990
2017-01-01 47.0 -0.145455
2018-01-01 68.0 0.446809
2019-01-01 65.0 -0.044118
company2 2015-01-01 22.0 NaN
2016-01-01 40.0 0.818182
2017-01-01 22.0 -0.450000
2018-01-01 12.0 -0.454545
2019-01-01 86.0 6.166667
company3 2015-01-01 47.0 NaN
2016-01-01 28.0 -0.404255
2017-01-01 91.0 2.250000
2018-01-01 63.0 -0.307692
2018-05-01 123.0 NaN
2019-01-01 57.0 -0.095238
版权声明:本文标题:python - Use pct_change for DateTime (sub)Index along with group_by for multi-index data frame - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736311326a1934697.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论