admin管理员组文章数量:1125092
problem
I have a smooth shaping/scaling factor that I need to apply to a stepped timeseries which has a label and flat value for that time period.
the result must
- match the value of the step within each step the average must i.e.
data.groupby('label')['steps'].mean() == data.groupby('label')['result'].mean()
- be a smooth line
- try maintain the shape of the shaping factor
in order of priority
I realise there are pay offs, the top two are must haves and the third is a nice to have.
Create data
The code below creates the data
import numpy as np
import random
import pandas as pd
random.seed(9001)
def generate_steps_data(as_of_date: pd.Timestamp = pd.Timestamp.utcnow(),
n_of_months: int =4,
n_of_quarters: int =5,
n_of_yr: int = 12,
)->pd.DataFrame:
list_of_periods = (
[(as_of_date+pd.Timedelta(days=30*(1+n))).to_period('M') for n in range(n_of_months)] +
[(as_of_date+pd.Timedelta(days=90*(1+n))).to_period('Q') for n in range(n_of_quarters)] +
[(as_of_date+pd.Timedelta(days=365*(1+n))).to_period('A-Apr') for n in range(n_of_yr)]
)
steps_table= pd.DataFrame.from_dict({str(p): {"start_date": p.start_time,
"end_date": p.end_time,
'level': (((p.start_time.weekofyear+p.end_time.weekofyear)/2-26)**2/26+55),
}
for p in list_of_periods},orient='index')
steps_table['end_date']=steps_table['end_date'].dt.date
steps_table['start_date']=steps_table['start_date'].dt.date
return steps_table.dropna()
def produce_data() -> pd.DataFrame:
steps=generate_steps_data()
steps_series=pd.concat([pd.Series(index=pd.date_range(row['start_date'], end=row['end_date'], freq='D'), data=random.random()*45) for ind, row in steps.iterrows()]).resample("D").first()
dtind=pd.date_range(steps['start_date'].min(), end=steps['end_date'].max(), freq='D')
smooth_series=pd.Series(index=dtind, data=(dtind.hour-12)**2 * np.random.normal(loc=10, scale=25, size=len(dtind))).resample("Q-Feb").mean().resample("D").interpolate(method='akima')/75
smooth_series /=smooth_series.mean()
smooth_series+=-1
produce_data = smooth_series.rename('shape').to_frame().join(steps_series.rename('step_series').to_frame())
produce_data = produce_data.assign(predict_trend = steps_series.rolling(365*3,center=True).mean().bfill().ffill()).dropna()
letters=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
produce_data['labels'] = produce_data['step_series'].map(dict(zip(produce_data['step_series'].unique(),letters)))
return produce_data
my first attempt
I have tried a few different things, but have not gotten much past this.
- extract long term trend from data
- apply shaping factor to long term trend
- use groupby and transform to adjust the data to match each step
but we end up with large discontinuities
data=produce_data()
# extract long term trend from data
data['predict_trend'] = data['step_series'].rolling(365*3,center=True).mean().bfill().ffill()
# apply shaping factor to long term trend
data['raw_prediction'] = (data['shape']+1)*data['predict_trend']
# use groupby and transform to adjust the data to match each step
data['first_attempt_adjusted'] = data['raw_prediction'].\
multiply(data.groupby('labels')['step_series'].transform('mean')).\
divide(data.groupby('labels')['raw_prediction'].transform('mean'))
data[['step_series','raw_prediction','first_attempt_adjusted']].plot()
I have tried all sorts of different smoothing but have not got a good way of removing the discontinuities.
please remember, #1 priority is that the average result across each step matches the step value and #2 is to have no discontinuities.
本文标签: pythonPandas shaping with adjustmentStack Overflow
版权声明:本文标题:python - Pandas shaping with adjustment - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736653116a1946190.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论