admin管理员组

文章数量:1336625

I am generating noise using a gaussian distribution, and have hit a conceptual block.

Is there a difference between generating a noise value and adding it to clean data:

def add_noise(data_frame, amplitude):
    noise = np.random.normal(0, scale = amplitude * 0.01, size = len(data_frame))
    return data_frame + noise

Or generating noise directly using the data you have:

def add_noise_alt(data_frame, amplitude):
    noise = np.random.normal(data_frame, scale = amplitude * 0.01)
    return noise

The plots returned are very similar, but conceptually they seem to be different things.

I am generating noise using a gaussian distribution, and have hit a conceptual block.

Is there a difference between generating a noise value and adding it to clean data:

def add_noise(data_frame, amplitude):
    noise = np.random.normal(0, scale = amplitude * 0.01, size = len(data_frame))
    return data_frame + noise

Or generating noise directly using the data you have:

def add_noise_alt(data_frame, amplitude):
    noise = np.random.normal(data_frame, scale = amplitude * 0.01)
    return noise

The plots returned are very similar, but conceptually they seem to be different things.

Share Improve this question edited Nov 19, 2024 at 17:46 chrslg 13.5k5 gold badges23 silver badges38 bronze badges asked Nov 19, 2024 at 17:43 hiddenuserhiddenuser 174 bronze badges 5
  • Short answer is it depends on the source of your noise! The forms you quote above are entirely equivalent. Gaussian additive noise there is no difference provided you know that the noise is constant due to say thermal noise in an amplifier. But it could be a problem if the noise was still Gaussian but proportional to amplitude as might happen to a candle flickering in a light breeze or a star twinkling. Poisson noise where the variance of the noise itself depends on amplitude is another matter entirely there you must take the signal amplitude into account every time. – Martin Brown Commented Nov 19, 2024 at 17:56
  • 1 Huh. I never knew you could pass an array as the first argument to np.random.normal. I suppose it's just for this purpose. – Frank Yellin Commented Nov 19, 2024 at 17:59
  • Even then, x+f(x)*np.random.normal(0,1) or np.random.normal(x, f(x)) doesn't seem fundamentally different (it is not like you would have to do a for loop in one case). Or I am missing something? (I mean, clearly, there are some noises for which it is more complicated. But here, we are talking gaussian noise. Even if each center and each standard deviation is unique, available in 2 arrays, centers and stdev, np.random.normal(center, stdev) would work, and so would center+stdev*np.random.normal(0,1). I would prefer the 1st (because, if numpy offers to do it for you, it is generally – chrslg Commented Nov 19, 2024 at 18:03
  • a good idea to let it do it. They usually are more efficient than me :D. But fundamentally it is the same, and it is not as if the zillions * and + of the second form were done in pure python. – chrslg Commented Nov 19, 2024 at 18:04
  • Plus, I may add, in favor of 2nd form, may be some numerical error would be better if there is some huge differences between the standard deviation (like factors 10⁵⁰). Maybe np.random.normal([1e-30, 1e-30], [1e-30, 1e30]) is better than np.random.normal(0,1,2,))*[1e-30, 1e30]+[1e-30,1e30]. But I am not even sure of that – chrslg Commented Nov 19, 2024 at 18:10
Add a comment  | 

2 Answers 2

Reset to default 1

Mathematically, it is the same.

Each value is data_frame plus a centered noise with standard deviation of amplitude×0.01

You can see it

import matplotlib.pyplot as plt
import numpy as np
x=np.arange(1000000)%100.0 # Just a way to have values between 0 and 100.
y=np.random.normal(x, 0.1) # Note that stdev is small compared to x values
plt.hist(y-x, bins=50)
plt.show()
print((y-x).mean()) # 1.0911322496417843e-05 Small enough
print((y-x).std()) # 0.10018683917918221  close enough to 0.1

So clearly, y-x is just a normal distribution of mean 0 and stdev 0.1. Exactly as if I had defined y=x + np.random.normal(0, 0.1, x.shape)

From a computation point of view, I would say that both are vectorized enough. (Whether is is np.random.normal that add the numbers that you passed to it, or whether it is numpy's + operator, must be roughly the same)

Is there a difference between generating a noise value and adding it to clean data, or generating noise directly using the data you have?

No.

If you look at the way that NumPy uses the loc and scale parameters, it uses them by multiplying the random value by scale, then adding loc.

double random_normal(bitgen_t *bitgen_state, double loc, double scale) {
  return loc + scale * random_standard_normal(bitgen_state);
}

Adding the value during generation, or after generation is the same thing.

You can check this idea experimentally by re-seeding the random number generator to the same value multiple times to get the same sequence out.

import numpy as np

values = np.random.rand(100)

np.random.seed(42)
loc_generated = np.random.normal(values, size=100)

np.random.seed(42)
add_generated = np.random.normal(0, size=100) + values

print(np.allclose(add_generated, loc_generated))

prints

True

本文标签: