admin管理员组

文章数量:1125092

problem

I have a smooth shaping/scaling factor that I need to apply to a stepped timeseries which has a label and flat value for that time period.

the result must

  • match the value of the step within each step the average must i.e.
    • data.groupby('label')['steps'].mean() == data.groupby('label')['result'].mean()
  • be a smooth line
  • try maintain the shape of the shaping factor

in order of priority

I realise there are pay offs, the top two are must haves and the third is a nice to have.

Create data

The code below creates the data

import numpy as np
import random
import pandas as pd

random.seed(9001)

def generate_steps_data(as_of_date: pd.Timestamp = pd.Timestamp.utcnow(),
                                  
                                  n_of_months: int =4,
                                  n_of_quarters: int =5,
                                  n_of_yr: int = 12,
                                  )->pd.DataFrame:

    list_of_periods = (
                        [(as_of_date+pd.Timedelta(days=30*(1+n))).to_period('M') for n in range(n_of_months)] +
                        [(as_of_date+pd.Timedelta(days=90*(1+n))).to_period('Q') for n in range(n_of_quarters)] +
                        [(as_of_date+pd.Timedelta(days=365*(1+n))).to_period('A-Apr') for n in range(n_of_yr)]
                        )
    
    steps_table= pd.DataFrame.from_dict({str(p): {"start_date": p.start_time,
                                                        "end_date": p.end_time,    
                                                        'level': (((p.start_time.weekofyear+p.end_time.weekofyear)/2-26)**2/26+55),
                                                        } 
                                            for p in list_of_periods},orient='index')
            
    steps_table['end_date']=steps_table['end_date'].dt.date
    steps_table['start_date']=steps_table['start_date'].dt.date

    return steps_table.dropna()


def produce_data() -> pd.DataFrame:
    steps=generate_steps_data()

    steps_series=pd.concat([pd.Series(index=pd.date_range(row['start_date'], end=row['end_date'], freq='D'), data=random.random()*45) for ind, row in steps.iterrows()]).resample("D").first()

    dtind=pd.date_range(steps['start_date'].min(), end=steps['end_date'].max(), freq='D')

    smooth_series=pd.Series(index=dtind, data=(dtind.hour-12)**2 * np.random.normal(loc=10, scale=25, size=len(dtind))).resample("Q-Feb").mean().resample("D").interpolate(method='akima')/75

    smooth_series /=smooth_series.mean()
    smooth_series+=-1
    produce_data = smooth_series.rename('shape').to_frame().join(steps_series.rename('step_series').to_frame())

    produce_data = produce_data.assign(predict_trend = steps_series.rolling(365*3,center=True).mean().bfill().ffill()).dropna()

    
    letters=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']

    produce_data['labels'] = produce_data['step_series'].map(dict(zip(produce_data['step_series'].unique(),letters)))

    return produce_data

my first attempt

I have tried a few different things, but have not gotten much past this.

  • extract long term trend from data
  • apply shaping factor to long term trend
  • use groupby and transform to adjust the data to match each step

but we end up with large discontinuities


data=produce_data()

# extract long term trend from data
data['predict_trend'] = data['step_series'].rolling(365*3,center=True).mean().bfill().ffill()

# apply shaping factor to long term trend
data['raw_prediction'] = (data['shape']+1)*data['predict_trend']

# use groupby and transform to adjust the data to match each step
data['first_attempt_adjusted'] = data['raw_prediction'].\
                                        multiply(data.groupby('labels')['step_series'].transform('mean')).\
                                        divide(data.groupby('labels')['raw_prediction'].transform('mean'))

data[['step_series','raw_prediction','first_attempt_adjusted']].plot()

I have tried all sorts of different smoothing but have not got a good way of removing the discontinuities.

please remember, #1 priority is that the average result across each step matches the step value and #2 is to have no discontinuities.

本文标签: pythonPandas shaping with adjustmentStack Overflow