admin管理员组

文章数量:1313333

I run this code on two machines:

from apscheduler.schedulers.asyncio import AsyncIOScheduler

# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()

# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
    pages = [...] # shortened for better readability. It is longer than 20 elements
    print("---")
    for page in random.sample(pages, min(len(pages), 20)):
        print(page)

On both machines I get different outputs which are strange:

  • Local docker container: I get 20 different lines every time do_dada_news() runs.
  • Kubernetes cluster: I get the exact same 20 lines every time it is run.

I expect both machines to have the same behavior. How can this be such a different behavior?

To temporarily fix the problem, I now do random.seed(time.time()*10000) inside do_dada_news(). But that does not feel right.

I run this code on two machines:

from apscheduler.schedulers.asyncio import AsyncIOScheduler

# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()

# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
    pages = [...] # shortened for better readability. It is longer than 20 elements
    print("---")
    for page in random.sample(pages, min(len(pages), 20)):
        print(page)

On both machines I get different outputs which are strange:

  • Local docker container: I get 20 different lines every time do_dada_news() runs.
  • Kubernetes cluster: I get the exact same 20 lines every time it is run.

I expect both machines to have the same behavior. How can this be such a different behavior?

To temporarily fix the problem, I now do random.seed(time.time()*10000) inside do_dada_news(). But that does not feel right.

Share Improve this question edited Feb 1 at 20:27 Péter Szilvási 2,1112 gold badges25 silver badges46 bronze badges asked Jan 30 at 22:04 FEZFEZ 211 bronze badge 3
  • Seeding the RNG from the time is the normal way to get a different random sequence on each run. – Barmar Commented Jan 30 at 22:11
  • But why do consecutive calls to random.sample() create different results on one system and always the same on the other? Consecutive calls to any random function usually create different results without seeding inbetween – FEZ Commented Jan 30 at 22:20
  • Of course you get different results on consecutive calls, it wouldn't be random if you didn't. Seeding just sets the starting point. As for why you get different results on each system, it could be a difference between Docker and Kubernetes. – Barmar Commented Jan 30 at 23:23
Add a comment  | 

1 Answer 1

Reset to default 0

If no seed is provided for pythons built-in random then it will use os.urandom() to set the seed. Crucially, if the operating system (Linux and Windows both do this) has a built in source of randomness it will default to using that instead of just using the system time.

While you could mess with the Linux configuration settings, it would be much easier just to initialize a random seed with random.seed(int(time.time())**20%999979).

Linux in particular uses an entropy pool as the source of randomness, and there's a suggestion here that the issue might be ameliorable with an upgrade to 5.6. In general though the entropy pool will require a short delay in order to generate the randomness needed.

If I was very concerned about not having this issue in future, I would set up a queue and create a function that when called returns the top number from the queue, deques it, and then adds a new random number to the bottom of the queue based on the mod-product of the numbers still in it. That way you shouldn't should be at least guaranteed a source of randomness that you control.

本文标签: pythonrandomsample() generating same sequence every time it is runStack Overflow