sorting - How to randomly select by type probability all items from a google sheet column range - Stack Overflow

IT技术

更新时间：2025-01-300

admin管理员组
文章数量:1181894

Here is a link (editable) to a detailed example of what I am stuck with: there is the input, the expected result and comments on the reasoning.

Basically, let's say I have in a column range a list of 1000 elements belonging to three types of object (it would be better if it could also work with more than three types). Each type has an associated probability. I want to sort this input column range so that each element is randomly picked according to their type while keeping the original relative order with this type in the input column, and whenever all elements of a type have been used, then we do the same while ignoring the items from the used up types.

So if there were only two types, with a probability of 0.000001 for Birds and 0.999999 for Insects, then we would expect to find all the Insects in the same order as the original list at the top, and the Birds at the bottom in the same order too.

I'm not sure if this kind of sorting has a name. At first my idea was to use a modified weighed random sort, but I couldn't make it keep the original order of the list for each type.

I have then written a working solution using countif to know how many items from the current type have been selected so far, and then an indirect formula to know which cell value to look at. Sadly, it's incredibly slow (around 20 seconds...), and that's a big no no as I need it to work at a reasonable speed with around 2000 items.

Could anyone help me with a more efficient solution please?

Here is a link (editable) to a detailed example of what I am stuck with: there is the input, the expected result and comments on the reasoning.

Basically, let's say I have in a column range a list of 1000 elements belonging to three types of object (it would be better if it could also work with more than three types). Each type has an associated probability. I want to sort this input column range so that each element is randomly picked according to their type while keeping the original relative order with this type in the input column, and whenever all elements of a type have been used, then we do the same while ignoring the items from the used up types.

So if there were only two types, with a probability of 0.000001 for Birds and 0.999999 for Insects, then we would expect to find all the Insects in the same order as the original list at the top, and the Birds at the bottom in the same order too.

I'm not sure if this kind of sorting has a name. At first my idea was to use a modified weighed random sort, but I couldn't make it keep the original order of the list for each type.

I have then written a working solution using countif to know how many items from the current type have been selected so far, and then an indirect formula to know which cell value to look at. Sadly, it's incredibly slow (around 20 seconds...), and that's a big no no as I need it to work at a reasonable speed with around 2000 items.

Could anyone help me with a more efficient solution please?

Share Improve this question edited yesterday asked yesterday Dreamk33 2471 gold badge3 silver badges13 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 3

Added one approach in your test sheet(separate tab); Do test it out extensively to see if its holding up for varied scenarios and all...

=let(Ξ,tocol(map(K3:K27,lambda(Λ,let(Δ,lambda(x,index(xlookup(x,G3:G5,E3:E5,,-1))),chooserows(filter(B:B,C:C=Δ(Λ)),countif(Δ(K3:Λ),Δ(Λ)))))),3),
 vstack(Ξ,filter(B3:B27,isna(xmatch(B3:B27,Ξ)))))

Here's a possible solution

=ARRAYFORMULA(LET(
   INPUT.val, B3:B,
   INPUT.type, C3:C,
   PROB.type, E3:E,
   PROB.prob, F3:F,
   rand, TOCOL(K3:K, 1),
   NORMALIZE, LAMBDA(table, LET(
     type, INDEX(table,,1), 
     prob, INDEX(table,,2),
     HSTACK( 
       {0; IF(ROWS(table) = 1 ,TOCOL(,1), SCAN(, ARRAY_CONSTRAIN(prob, ROWS(prob)-1, 1) / SUM(prob), LAMBDA(a, c, a + c)))},
       type
     )
   )),
   GROUP, LAMBDA(t, LET(u, UNIQUE(TOCOL(INPUT.type, 1)), {u, COUNTIF(t, u)})),
   global_counts, GROUP(INPUT.type),
   REDUCE(TOCOL(,1), rand, LAMBDA(result, cur_rand, LET(
     local_counts, GROUP(VLOOKUP(result, {INPUT.val, INPUT.type}, 2, )),
     prob_table, NORMALIZE(
       FILTER(
         {PROB.type, PROB.prob},
         VLOOKUP(PROB.type, local_counts, 2, ) < VLOOKUP(PROB.type, global_counts, 2, )
       )
     ),
     cur_type, VLOOKUP(cur_rand, prob_table, 2),
     VSTACK(
       result,
       INDEX(
         FILTER(INPUT.val, INPUT.type = cur_type),
         1 + VLOOKUP(cur_type, local_counts, 2, )
       )
     )
   )))
 ))

How it works

First, we assign a readable name to the ranges using the LET function

=LET(
   INPUT.val, B3:B,
   INPUT.type, C3:C,
   PROB.type, E3:E,
   PROB.prob, F3:F,

Then we define a function NORMALIZE(table) that takes as input a table in the form {type, probability} and returns the normalized probability table

   NORMALIZE, LAMBDA(table, LET(
     type, INDEX(table,,1), 
     prob, INDEX(table,,2),
     HSTACK( 
       {0; IF(ROWS(table) = 1 ,TOCOL(,1), SCAN(, ARRAY_CONSTRAIN(prob, ROWS(prob)-1, 1) / SUM(prob), LAMBDA(a, c, a + c)))},
       type
     )
   ))

For example, if table is:


COUNTRY	0.6
COLOUR	0.3
NUMBER	0.1

NORMALIZE(table) is:


0	COUNTRY
0.6	COLOUR
0.9	NUMBER

If table is:


COUNTRY	0.6
NUMBER	0.1

NORMALIZE(table) is:


0	COUNTRY
0.857	NUMBER

The normalized table puts the cumulative probabilities on the left so it's easier to perform a vertical lookup.

Then we define a function GROUP(t) that takes as input an array of types (t) and for each unique type in C3:C, returns how many times it occurs in t

   GROUP, LAMBDA(t, LET(u, UNIQUE(TOCOL(INPUT.type, 1)), {u, COUNTIF(t, u)}))

For example, if t is:


COUNTRY
NUMBER

GROUP(t) is:


COUNTRY	1
COLOUR	0
NUMBER	1

We then define a variable global_counts that counts how many times each type occurs in C3:C using the GROUP function we've just defined

   global_counts, GROUP(INPUT.type)

Now, we use REDUCE to iterate over each random number in K3:K

   REDUCE(TOCOL(,1), rand, LAMBDA(result, cur_rand,

Then we calculate the current counts using the GROUP function

     local_counts, GROUP(VLOOKUP(result, {INPUT.val, INPUT.type}, 2, ))

And the normalized probability table, while filtering out the types that were already used

     prob_table, NORMALIZE(
       FILTER(
         {PROB.type, PROB.prob},
         VLOOKUP(PROB.type, local_counts, 2, ) < VLOOKUP(PROB.type, global_counts, 2, )
       )
     ),

We can now calculate the current type associated to the current random number using a simple VLOOKUP

     cur_type, VLOOKUP(cur_rand, prob_table, 2),

Finally, we append the value associated to the current type to the result

     VSTACK(
       result,
       INDEX(
         FILTER(INPUT.val, INPUT.type = cur_type),
         1 + VLOOKUP(cur_type, local_counts, 2, )
       )
     )

本文标签： sortingHow to randomly select by type probability all items from a google sheet column rangeStack Overflow

版权声明：本文标题：sorting - How to randomly select by type probability all items from a google sheet column range - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1738197948a2068255.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

sorting - How to randomly select by type probability all items from a google sheet column range - Stack Overflow

2 Answers 2

How it works

更多相关文章

sorting - How to randomly select by type probability all items from a google sheet column range - Stack Overflow

发表评论

推荐文章

kubernetes - kubectl patch deployment to new configmap not working - Stack Overflow

javascript - how get get file name in file chooser in react? - Stack Overflow

flutter - AdMob NativeAd should be loaded every time it is shown? - Stack Overflow

javascript - Pass input tag value to as href parameter - Stack Overflow

javascript - How to find all the posts with a specific tag in Ghost and iterate over them? - Stack Overflow

热门文章

testing - STM32 Canbus Multiple Device Weird İssue - Stack Overflow

操作系统的下载流程，Linux的下载

Emeditor , python tool only works if there are no spaces in the filename - Stack Overflow

php - Magento :: Translate text from javascript files - Stack Overflow

javascript - Deserialize a JSON object in Fabric.js - Stack Overflow

javascript - How do I make a field autofocus? - Stack Overflow

javascript - is it okay to pass large variable into React state? - Stack Overflow

sql - Postgresql with accent-insensitive and case-insensitive pattern matching - Stack Overflow

ggplot2 - Geom_line and geom_point issues to distinguish variations over time from pivot_longer data in R - Stack Overflow

WebRTC Connection Fails with OpenAI Realtime API in Production - Stack Overflow

最新文章

Photoshop 的22年 - 从 1.0 到 CS6

50款森系暗黑风格调色，Lightroom预设合集手机滤镜下载

八大在线PS工具，你不能错过的编辑利器

Adobe彻底改造Photoshop 推移动版本

AI绘画与修图：重塑数字艺术的新纪元

javascript - Is iteration faster than recursion, or just less prone to stack overflows? - Stack Overflow

javascript - Algorithm for elliptic curve point compression - Stack Overflow

javascript - How include an external JS file in a JSP page - Stack Overflow

javascript - knex: what is the appropriate way to create an array from results? - Stack Overflow

How to filter a very large bootstrap table using pure Javascript - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价