admin管理员组文章数量:1313333
I run this code on two machines:
from apscheduler.schedulers.asyncio import AsyncIOScheduler
# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()
# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
pages = [...] # shortened for better readability. It is longer than 20 elements
print("---")
for page in random.sample(pages, min(len(pages), 20)):
print(page)
On both machines I get different outputs which are strange:
- Local docker container: I get 20 different lines every time
do_dada_news()
runs. - Kubernetes cluster: I get the exact same 20 lines every time it is run.
I expect both machines to have the same behavior. How can this be such a different behavior?
To temporarily fix the problem, I now do random.seed(time.time()*10000)
inside do_dada_news()
. But that does not feel right.
I run this code on two machines:
from apscheduler.schedulers.asyncio import AsyncIOScheduler
# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()
# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
pages = [...] # shortened for better readability. It is longer than 20 elements
print("---")
for page in random.sample(pages, min(len(pages), 20)):
print(page)
On both machines I get different outputs which are strange:
- Local docker container: I get 20 different lines every time
do_dada_news()
runs. - Kubernetes cluster: I get the exact same 20 lines every time it is run.
I expect both machines to have the same behavior. How can this be such a different behavior?
To temporarily fix the problem, I now do random.seed(time.time()*10000)
inside do_dada_news()
. But that does not feel right.
1 Answer
Reset to default 0If no seed is provided for pythons built-in random then it will use os.urandom() to set the seed. Crucially, if the operating system (Linux and Windows both do this) has a built in source of randomness it will default to using that instead of just using the system time.
While you could mess with the Linux configuration settings, it would be much easier just to initialize a random seed with random.seed(int(time.time())**20%999979).
Linux in particular uses an entropy pool as the source of randomness, and there's a suggestion here that the issue might be ameliorable with an upgrade to 5.6. In general though the entropy pool will require a short delay in order to generate the randomness needed.
If I was very concerned about not having this issue in future, I would set up a queue and create a function that when called returns the top number from the queue, deques it, and then adds a new random number to the bottom of the queue based on the mod-product of the numbers still in it. That way you shouldn't should be at least guaranteed a source of randomness that you control.
本文标签: pythonrandomsample() generating same sequence every time it is runStack Overflow
版权声明:本文标题:python - random.sample() generating same sequence every time it is run - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741934215a2405754.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
random.sample()
create different results on one system and always the same on the other? Consecutive calls to any random function usually create different results without seeding inbetween – FEZ Commented Jan 30 at 22:20