admin管理员组

文章数量:1388067

I have data from a PLC coming in ONLY ON CHANGE into table format (TIMESTAMP, TAG, VALUE). I have a visualisation tool (seeq) that queries this base table in snowflake and shows the data on a time series chart. If a user selects a long time-range, then this data will need to be aggregated (max 2000 points per time series plot). I want this aggregation (average) to be weighted for how long a tag has been on that value before its change. For example if I have a tag = 'cheese' and t=0 -> t=5 has a value of 100, then t=6 -> t=100 has a value of 500. If the user in seeq selects this tag, on a long period window (i.e. spans from t=0 to t=100000), the data registered from this tag has to be aggregated to (5100+95500)/100 for t=50 (mid point) on the plot. How to generate a query for this in snowflake using this base table format of TIMESTAMP, TAG, VALUE.

Tried doing a cross join to a time dimension table created from a tag dimension table then using a lead function to get every single second output from the raw data spread across every time and then weight it accordingly. It was not very performant in terms of speed.

I have data from a PLC coming in ONLY ON CHANGE into table format (TIMESTAMP, TAG, VALUE). I have a visualisation tool (seeq) that queries this base table in snowflake and shows the data on a time series chart. If a user selects a long time-range, then this data will need to be aggregated (max 2000 points per time series plot). I want this aggregation (average) to be weighted for how long a tag has been on that value before its change. For example if I have a tag = 'cheese' and t=0 -> t=5 has a value of 100, then t=6 -> t=100 has a value of 500. If the user in seeq selects this tag, on a long period window (i.e. spans from t=0 to t=100000), the data registered from this tag has to be aggregated to (5100+95500)/100 for t=50 (mid point) on the plot. How to generate a query for this in snowflake using this base table format of TIMESTAMP, TAG, VALUE.

Tried doing a cross join to a time dimension table created from a tag dimension table then using a lead function to get every single second output from the raw data spread across every time and then weight it accordingly. It was not very performant in terms of speed.

Share Improve this question asked Mar 17 at 22:15 AustinAustin 1 1
  • something like NTILE docs.snowflake/en/sql-reference/functions/ntile might be a way to chunk the data, and then do some average/weighted operation. – Simeon Pilgrim Commented Mar 17 at 22:15
Add a comment  | 

1 Answer 1

Reset to default 0

so I am not really sure what you are trying todo, but an explosive way of doing something like what you describe can be done like so:

with d0 as (
    select * from values
     ('cheese', 0, 5, 100),
     ('cheese', 6, 100, 500)
     t(tag, _s, _e, val)
 ), d1 as (
     select 
        tag,
        value::number rn,
        val,
        ntile(10) over (partition by tag order by rn) as tile,
     from d0, 
        table(flatten(array_generate_range(_s, _e+1)))
)
select
    tag,
    tile,
    avg(rn) as mid,
    avg(val) as val
from d1
group by 1,2
order by 1,2;

which gives:

TAG TILE MID VAL
cheese 1 5.000000 281.818182
cheese 2 15.500000 500.000000
cheese 3 25.500000 500.000000
cheese 4 35.500000 500.000000
cheese 5 45.500000 500.000000
cheese 6 55.500000 500.000000
cheese 7 65.500000 500.000000
cheese 8 75.500000 500.000000
cheese 9 85.500000 500.000000
cheese 10 95.500000 500.000000

those rows do not really need expanding, and the interpolation can be driven against a d0 like table, if that is how your data is sourced..

本文标签: Snowflake TimeSeries Weighted ValuesStack Overflow