admin管理员组

文章数量:1125465

I am working with fuel consumption data received from a sensor, but sometimes the data contains noise (sudden jumps or drops) that makes it inconsistent. My goal is to identify and remove these outliers to ensure the data is accurate and reliable for further analysis.

Here are the key details:

  • The sensor data contains records with a timestamp in unix, fuel consumption value, speed and other data.
  • Data size ranges between 40-80 records every 10 minutes.
  • I need a consistent and robust method to filter out the noise and smooth the data.

Below is the code I have implemented so far:


// value == Fuel Consumption
var data = FileReader.ReadCsv(path).Where(d => d.Value > 0).ToList();

var cleanedData = RemoveOutliers(data.Select(d => new DataPoint(d.Timestamp, d.Value, d.Speed)).ToList(), 1.5);
cleanedData = ApplyMovingAverage(cleanedData, 8);

List<AnomalyDetectionResult> anomalyDetectionResults = [];
foreach (var dataPoint in cleanedData)
{
    // todo
}

static List<DataPoint> RemoveOutliers(List<DataPoint> data, double iqrFactor)
{
    var values = data.Select(d => d.Value).ToList();
    values.Sort();

    double q1 = GetPercentile(values, 25);
    double q3 = GetPercentile(values, 75);
    double iqr = q3 - q1;
    double lowerBound = q1 - iqrFactor * iqr;
    double upperBound = q3 + iqrFactor * iqr;

    return data.Where(d => d.Value >= lowerBound && d.Value <= upperBound).ToList();
}

static List<DataPoint> ApplyMovingAverage(List<DataPoint> data, int windowSize)
{
    var smoothedData = new List<DataPoint>();
    for (int i = 0; i < data.Count; i++)
    {
        var window = data.Skip(Math.Max(0, i - windowSize + 1)).Take(windowSize).ToList();
        double avg = window.Average(d => d.Value);
        smoothedData.Add(new DataPoint(data[i].Timestamp, avg, data[i].Speed));
    }
    return smoothedData;
}

static double GetPercentile(List<double> sortedValues, double percentile)
{
    if (!sortedValues.Any()) return 0;

    double rank = percentile / 100.0 * (sortedValues.Count - 1);
    int lowerIndex = (int)Math.Floor(rank);
    int upperIndex = (int)Math.Ceiling(rank);

    if (lowerIndex == upperIndex) return sortedValues[lowerIndex];

    return sortedValues[lowerIndex] + (rank - lowerIndex) * (sortedValues[upperIndex] - sortedValues[lowerIndex]);
}

public class DataPoint(DateTime timestamp, double value, int speed)
{
    public DateTime Timestamp { get; set; } = timestamp;
    public double Value { get; set; } = value;
    public int Speed { get; set; } = speed;
}

Before my code

after my code still there drops

I would appreciate any guidance, suggestions, or alternative approaches to solving this problem.

I am working with fuel consumption data received from a sensor, but sometimes the data contains noise (sudden jumps or drops) that makes it inconsistent. My goal is to identify and remove these outliers to ensure the data is accurate and reliable for further analysis.

Here are the key details:

  • The sensor data contains records with a timestamp in unix, fuel consumption value, speed and other data.
  • Data size ranges between 40-80 records every 10 minutes.
  • I need a consistent and robust method to filter out the noise and smooth the data.

Below is the code I have implemented so far:


// value == Fuel Consumption
var data = FileReader.ReadCsv(path).Where(d => d.Value > 0).ToList();

var cleanedData = RemoveOutliers(data.Select(d => new DataPoint(d.Timestamp, d.Value, d.Speed)).ToList(), 1.5);
cleanedData = ApplyMovingAverage(cleanedData, 8);

List<AnomalyDetectionResult> anomalyDetectionResults = [];
foreach (var dataPoint in cleanedData)
{
    // todo
}

static List<DataPoint> RemoveOutliers(List<DataPoint> data, double iqrFactor)
{
    var values = data.Select(d => d.Value).ToList();
    values.Sort();

    double q1 = GetPercentile(values, 25);
    double q3 = GetPercentile(values, 75);
    double iqr = q3 - q1;
    double lowerBound = q1 - iqrFactor * iqr;
    double upperBound = q3 + iqrFactor * iqr;

    return data.Where(d => d.Value >= lowerBound && d.Value <= upperBound).ToList();
}

static List<DataPoint> ApplyMovingAverage(List<DataPoint> data, int windowSize)
{
    var smoothedData = new List<DataPoint>();
    for (int i = 0; i < data.Count; i++)
    {
        var window = data.Skip(Math.Max(0, i - windowSize + 1)).Take(windowSize).ToList();
        double avg = window.Average(d => d.Value);
        smoothedData.Add(new DataPoint(data[i].Timestamp, avg, data[i].Speed));
    }
    return smoothedData;
}

static double GetPercentile(List<double> sortedValues, double percentile)
{
    if (!sortedValues.Any()) return 0;

    double rank = percentile / 100.0 * (sortedValues.Count - 1);
    int lowerIndex = (int)Math.Floor(rank);
    int upperIndex = (int)Math.Ceiling(rank);

    if (lowerIndex == upperIndex) return sortedValues[lowerIndex];

    return sortedValues[lowerIndex] + (rank - lowerIndex) * (sortedValues[upperIndex] - sortedValues[lowerIndex]);
}

public class DataPoint(DateTime timestamp, double value, int speed)
{
    public DateTime Timestamp { get; set; } = timestamp;
    public double Value { get; set; } = value;
    public int Speed { get; set; } = speed;
}

Before my code

after my code still there drops

I would appreciate any guidance, suggestions, or alternative approaches to solving this problem.

Share Improve this question edited 2 days ago DPTP asked 2 days ago DPTPDPTP 193 bronze badges 8
  • 1 Is there any problem with your code? Does it fail to achieve what you want and if yes: how? Note that asking for library recommendations is outside the scope of StackOverflow questions. – Klaus Gütter Commented 2 days ago
  • That sounds like a Math problem before it would become a code problem ... Are you set with your maths and could give the formulas that you are trying to implement? – Fildor Commented 2 days ago
  • Unrelated: Why do you filter out == 0 Values? If the engine is not running, but the sensor is reading data, you surely want to see exactly that, don't you? – Fildor Commented 2 days ago
  • Also: the question is lacking a statement about how the result is not meeting your expectations. What is your goal? – Fildor Commented 2 days ago
  • Question Updated – DPTP Commented 2 days ago
 |  Show 3 more comments

1 Answer 1

Reset to default 1

The RemoveOutliers seem incorrect to me. If I understand this correctly it would remove the top and bottom portions of a constant ramp. So I would remove this part completely.

You might consider switching to a guassian filter, these tend to produce a better result than a simple moving average. You might also consider doing some frequency analysis to optimize the filter parameters for your particular case.

A more modern approach might be to train a neural network, assuming you can somehow get large amounts of both the true and noisy data for training.

You might also want to investigate the source of the noise, if you can make a better model of how it occurs and if you can figure out some better way to approach the problem. If this is a vechicle, perhaps it has something to do with fuel sloshing? If so you might be able to identify the start and end of these events and disregard these samples.

本文标签: cRemoving Noise (Jumps and Drops) from Sensor Data for Fuel ConsumptionStack Overflow