admin管理员组

文章数量:1122832

Let’s assume a pellet-fired meat smoker (cooking device) with temperature setting and a temperature probe. The temperature is maintained automatically by an embedded controller with limited memory and an uptime clock. Uptime (duration of use) will vary and is not known in advance.

Let’s say we want to store temperature probe samples so that at any time during the use, a line-chart of temperature samples can be generated.

Is there a clever/elegant algorithm and data structure for maintaining and culling probe readings which are approximately evenly spaced in time over any duration of cook? Assume that the uptime clock value would require one unit of memory and each sample takes one unit of memory, and there are only M units of memory for this purpose, how can we store the most samples which have optimal spacing over time, culling and replacing the most redundant samples as necessary and the uptime grows? Can it be done without moving samples and without explicitly storing the timestamp with each sample?

Maybe related to “Reservoir Sampling” but not sure that’s a match here because we need to reconstruct when the sample was captured too, not just that the value is in the reservoir somewhere.

A poor solution, I think: If we have 3 units of memory for samples, then taking a reading at every power of two time and storing them in a ring buffer, we'd retain a 25% reading, a 50% reading, and the most recent reading. Waiting double that time again, the previous 25% (oldest reading) would be replaced. The previous 50% reading is now twice as old again and is considered the new 25%, quarter-time sample and the previous reading is the mid-time sample. But this biases our good time resolution to the past and we have hardly any data about recent events.

Let’s assume a pellet-fired meat smoker (cooking device) with temperature setting and a temperature probe. The temperature is maintained automatically by an embedded controller with limited memory and an uptime clock. Uptime (duration of use) will vary and is not known in advance.

Let’s say we want to store temperature probe samples so that at any time during the use, a line-chart of temperature samples can be generated.

Is there a clever/elegant algorithm and data structure for maintaining and culling probe readings which are approximately evenly spaced in time over any duration of cook? Assume that the uptime clock value would require one unit of memory and each sample takes one unit of memory, and there are only M units of memory for this purpose, how can we store the most samples which have optimal spacing over time, culling and replacing the most redundant samples as necessary and the uptime grows? Can it be done without moving samples and without explicitly storing the timestamp with each sample?

Maybe related to “Reservoir Sampling” but not sure that’s a match here because we need to reconstruct when the sample was captured too, not just that the value is in the reservoir somewhere.

A poor solution, I think: If we have 3 units of memory for samples, then taking a reading at every power of two time and storing them in a ring buffer, we'd retain a 25% reading, a 50% reading, and the most recent reading. Waiting double that time again, the previous 25% (oldest reading) would be replaced. The previous 50% reading is now twice as old again and is considered the new 25%, quarter-time sample and the previous reading is the mid-time sample. But this biases our good time resolution to the past and we have hardly any data about recent events.

Share Improve this question edited Nov 21, 2024 at 15:09 Jason Kleban asked Nov 21, 2024 at 13:59 Jason KlebanJason Kleban 20.7k18 gold badges80 silver badges127 bronze badges 3
  • One suggestion for the time stamp would be to store the offset from the previous sample, which you should be able to do more compactly, but then that would obviously require updates when you delete intermediate samples to make room for more as the buffer approaches fullness. You could store the offset from the start of the session instead, but that would presumably require more space. – 500 - Internal Server Error Commented Nov 21, 2024 at 14:50
  • Hardly any data about recent events sounds right, with old data surviving only if they reflect more recent events. I can offer ideas for how to approach this, but I don't know a generally optimal approach. Would tentative ideas at the answer be acceptable? – btilly Commented Nov 21, 2024 at 15:19
  • @btilly I don’t really understand where you say “sounds about right”? – Jason Kleban Commented Nov 22, 2024 at 1:41
Add a comment  | 

2 Answers 2

Reset to default 1

Let's start with an easy, somewhat sub-optimal solution:

Solution 1: Whenever your buffer fills up, discard every 2nd sample and divide your sampling rate by 2.

This has some nice properties -- you always have a consistent sampling rate across the whole uptime period and your samples are always regularly spaced.

Of course the problem with this solution is that sometimes you are leaving up to half your buffer unused, and if it was all used, you could have almost twice as many samples as you do.

It seems to get better if we just used smaller steps:

Solution 2: Whenever your buffer fills up, discard every 10th sample and multiply your sampling rate by 0.9.

Now you are always using at least 90% of your buffer, which is great, but the spacing of the samples you keep is non-uniform and will end up with some annoying visible patterns in it.

In addition to always making good use of the buffer, we would like the spacing of our samples to be nearly uniform at all times.

This requires a little magic:

Solution 3: Blue noise mask

To fix the remaining problem, we can use a trick called a "blue noise mask". There's a lot you can google about them, but since the technique was invented for digital halftoning (dithering), almost all of it is for 2D applications. It does work just as well for 1D, though. It's also conceptually very simple. Adapted for your specific problem, it works as follows.

Let's say you want to each buffer compression to reduce the number of samples by about 10%. We'll divide the count by about 7^(1/19) each time, so that after 19 compressions, we have divided the sample count by exactly 7. I think the prime numbers will help make the result look more natural.

The blue noise mask we want is just an array of numbers, where each index corresponds to a sample time, such that:

  • The number at each index represents the number of compressions that the corresponding sample will survive. We will use numbers from 0 to 19
  • The number of samples with lifetime >= x is about 7^(1/19) * the number with lifetime >= x+1
  • Exactly every 7th sample has lifetime 19
  • The length of the array is a multiple of 7 large enough so that the repeats won't be noticable. About 252 should be fine.
  • Selecting all the indexes with lifetimes >= x, for any 0 <= x <= 19, always produces an approximately uniform distribution.

Given such a mask, we have a really good solution to your problem:

When your buffer fills up, apply a compression using the blue noise mask. Starting at 0, for the nth compression, iterate through the mask and:

  1. For each entry == (n%19), discard a sample.
  2. For each entry > (n%19), keep a sample

Note that the samples with entries < (n%19) are the ones that were discarded in previous compressions.

Iterate though the mask repeatedly until you've accounted for every sample in your buffer.

The mask should also be used to control your sampling rate. Always sample at the times that would have been preserved by the last compression.

Making a blue noise mask

This is a really good solution for your problem, but in the requirements listed above for the blue noise mask, this one is hard to meet: "Selecting all the indexes with lifetimes >= x, for any 0 <= x <= 19, always produces an approximately uniform distribution"

To accomplish that feat, start by assigning lifetime 19 to every 7th entry, starting at 0. Then use 1D Poisson disk sampling with appropriately decreasing disk size to gradually fill in the entries for shorter lifetimes.

Here it is as a suboptimal solution, but visualized. Not as tricky as I originally thought. This memory stays full, doesn't need to move data (each shade of blue represents the memory cell where that sample was written/is read). Each row is a time-unit-longer duration cook time. The biggest downside is a lack of values in the first 25% of the log (passStep *= 4;) which can be reduced, but at the expense of the size of the jumps with each pass - at those points we'd experience suddenly larger gaps of time in our recent history.

function* ringIndexSequence(memorySize, until = Infinity) {
    let t = 0;
    let pass = 0;
    let index = 0;
    let firstTOfPass = 0;
    let passStep = 1;

    while(t < until) {
        yield { t, index };

        if (index === memorySize - 1) {
            firstTOfPass += memorySize * (2 ** pass);
            index = -1;
            pass += 1;
            passStep *= 4;
        }

        t += passStep;
        index += 1;
    }
}

const memorySize = 4;

const progression = Array.from(new Array(80)).reduce((acc, _, cur) => {
    const all = Array.from(ringIndexSequence(memorySize, cur));
    const trailing = all.slice(-memorySize);

    const rendered = trailing.reduce(
      (acc, { t, index }) => acc.toSpliced(t, 1, '<div class="moment" style="background-color: hsl(234 100% ' + ((index + 1) / memorySize) * 95 + '%)"></div>'), 
      Array.from(new Array(cur)).fill('<div class="moment"></div>'));

    return [ ...acc, '<div>' + rendered.join('') + '</div>' ];
}, []);

document.getElementById('demo-out').innerHTML = progression.join('');
.moment {
    display: inline-block;
    height: 2px;
    width: 2px;
    background-color: hsl(234 100% 99%);
}

* {
  line-height: 0;
  font-size: 0;
}
<div id="demo-out" />

本文标签: algorithmStrategies for tracking N quasiperiodic samples over an unbounded durationStack Overflow