admin管理员组文章数量:1418285
SETUP
I have a list days
and a value N
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
WHAT I AM TRYING TO DO
I am trying to create a list selections
with length N
where I uniformly in frequency sample values from days
(remainders are fine). I would like the order of this list to then be shuffled.
EXAMPLE OUTPUT
NOTE HOW THE ORDER IS SHUFFLED, BUT THE DISTRIBUTION OF VALUES IS UNIFORM
selections
['Wednesday','Friday','Monday',...'Tuesday','Thursday','Monday']
import collections
counter = collections.Counter(selections)
counter
Counter({'Monday': 11, 'Tuesday': 10, 'Wednesday': 11, 'Thursday': 10, 'Friday': 10})
WHAT I HAVE TRIED
I have code to randomly select N
values from days
from random import choice, seed
seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
selections = [choice(days) for x in range(N)]
But they aren't selected uniformly
import collections
counter = collections.Counter(selections)
counter
Counter({'Tuesday': 9,
'Friday': 8,
'Monday': 14,
'Wednesday': 7,
'Thursday': 14})
How can I adjust this code or what different method will create a list of length N
with a uniform distribution of values from days
in a random order?
EDIT: I obviously seemed to have phrased this question poorly. I am looking for list with length N
with a uniform distribution of values from days
but in a shuffled order (what I meant by random.) So I suppose what I am looking for is how to uniformly sample values from days
N
times, then just shuffle that list. Again, I want an equal amount of each value from days
making up a list with length N
. I need a uniform distribution for a list of exactly length 52, just as the example output shows.
SETUP
I have a list days
and a value N
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
WHAT I AM TRYING TO DO
I am trying to create a list selections
with length N
where I uniformly in frequency sample values from days
(remainders are fine). I would like the order of this list to then be shuffled.
EXAMPLE OUTPUT
NOTE HOW THE ORDER IS SHUFFLED, BUT THE DISTRIBUTION OF VALUES IS UNIFORM
selections
['Wednesday','Friday','Monday',...'Tuesday','Thursday','Monday']
import collections
counter = collections.Counter(selections)
counter
Counter({'Monday': 11, 'Tuesday': 10, 'Wednesday': 11, 'Thursday': 10, 'Friday': 10})
WHAT I HAVE TRIED
I have code to randomly select N
values from days
from random import choice, seed
seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
selections = [choice(days) for x in range(N)]
But they aren't selected uniformly
import collections
counter = collections.Counter(selections)
counter
Counter({'Tuesday': 9,
'Friday': 8,
'Monday': 14,
'Wednesday': 7,
'Thursday': 14})
How can I adjust this code or what different method will create a list of length N
with a uniform distribution of values from days
in a random order?
EDIT: I obviously seemed to have phrased this question poorly. I am looking for list with length N
with a uniform distribution of values from days
but in a shuffled order (what I meant by random.) So I suppose what I am looking for is how to uniformly sample values from days
N
times, then just shuffle that list. Again, I want an equal amount of each value from days
making up a list with length N
. I need a uniform distribution for a list of exactly length 52, just as the example output shows.
5 Answers
Reset to default 5The code you have is correct. You are seeing expected noise around the mean.
Note that for higher N, the relative noise decreases, as expected. For example, this is what you get for N = 10000000
:
Counter({'Tuesday': 2000695, 'Thursday': 2000615, 'Wednesday': 2000096, 'Monday': 1999526, 'Friday': 1999068})
If you need equal or approximately equal (deterministic, rather than random) numbers of each element in random order, try a combination of itertools.cycle
, itertools.islice
and random.shuffle
like so:
import random
import collections
import itertools
random.seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
# If `N` is not divisible by `len(days)`, this line ensures that the last
# `N % len(days)` elements of `selections` also stay random:
random.shuffle(days)
selections = list(itertools.islice(itertools.cycle(days), N))
random.shuffle(selections)
print(selections)
counter = collections.Counter(selections)
print(counter)
Output:
['Friday', 'Friday', 'Wednesday', ..., 'Thursday']
Counter({'Tuesday': 11, 'Monday': 11, 'Friday': 10, 'Wednesday': 10, 'Thursday': 10})
According to the documentation
For integers, there is uniform selection from a range. For sequences, there is uniform selection of a random element, a function to generate a random permutation of a list in-place, and a function for random sampling without replacement. [emphasis mine]
The differences you are seeing come down to randomness as you might imagine.
For demonstration, I've set up the same test using choice
, uniform
, and randint
- you'll notice they all provide similar (random) results:
from collections import Counter
from random import choice, seed, uniform, randint
# seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
selections = [choice(days) for _ in range(N)] # what you're doing now
uni = [days[int(uniform(0, len(days)))] for _ in range(N)]
randi = [days[randint(0, len(days) - 1)] for _ in range(N)]
print(Counter(selections))
print(Counter(uni))
print(Counter(randi))
Output from a random sample:
Counter({'Tuesday': 16, 'Thursday': 13, 'Wednesday': 9, 'Monday': 7, 'Friday': 7})
Counter({'Friday': 14, 'Wednesday': 11, 'Monday': 10, 'Thursday': 9, 'Tuesday': 8})
Counter({'Friday': 15, 'Monday': 12, 'Wednesday': 10, 'Tuesday': 9, 'Thursday': 6})
You could build a list of days with a uniform distribution (not randomly) then just shuffle it.
Something like this:
import random
from collections import Counter
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
# populate lista with at least N values then truncate it to the required length
lista = (days * (N//len(days)+1))[:N]
# demonstrate uniformity
print(Counter(lista))
random.shuffle(lista)
print(lista)
If you want uniform distribution (which isn't random at all), then you really want to use random choices only for the remainder, which is N % len(days)
.
In your example, N is 52 and there are five days in the list, so that's ten occurrences of each day, leaving two remaining choices for random additional days (and you should ensure the same day isn't chosen twice.)
So, make a new list with N // len(days)
copies of days
, shuffle the list, then add N % len(days)
additional random choices.
The frequencies don't change, so a simpler solution would be to just assign frequencies to a randomly-shuffled list of keys:
#!/usr/bin/env python
import random
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
N = 52
random.shuffle(days)
n_days = len(days)
days_counter = {day: 0 for day in days}
for i in range(0, N):
day = days[i % n_days]
days_counter[day] += 1
assert(sum(days_counter.values()) == N)
print(days_counter)
If you then need a uniform sample of days from these frequencies, you can use rejection sampling:
days_sample = []
while len(days_sample) < N:
day_idx = random.randint(0, n_days - 1)
day = days[day_idx]
if days_counter[day] > 0:
days_counter[day] -= 1
days_sample.append(day)
assert(len(days_sample) == N)
print(days_sample)
本文标签:
版权声明:本文标题:Create an N length list by uniformly (in frequency) selecting items from a separate list in python - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744687649a2619800.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
days
of lengthN
? I updated my answer for this use case. – Timur Shtatland Commented Mar 13 at 18:21