r - How to number groups of randomly repeated values into sets of n unique values? - Stack Overflow

IT技术

更新时间：2025-03-110

admin管理员组
文章数量:1296918

How can I create a new grp variable that divides row into groups of n unique row values each, within type? e.g. below, for n = 4, the first four subgroups of repeated -or unique- values of row are grp 1 (red), the next four subgroups of repeated -or unique- values of row are grp 2 (blue), and so on. The ideal would be a function allowing n to be changed as desired, but not necessarily.

Knowing that, within type, row is always in an ascending order but not necessarily continuously, and that the number of its repetitions can vary randomly.

Edit: in addition, to secure in the case where n is not an exact multiple of 4 for a given type (see the exchanges of the answers provided), grp would return NA for all this type, ideally.

Note: here is a small example, but my database has thousands of rows and types, with much more repetitions.

Initial and desired data:

Initial data:

dat0 <-
structure(list(type = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), row = c(5, 
5, 6, 8, 8, 8, 10, 11, 11, 11, 13, 13, 14, 14, 18, 18, 18, 3, 
4, 4, 4, 6, 6, 7, 7, 7, 9, 9, 10, 10, 10, 12, 16, 16, 21, 22, 
22, 22, 23, 23, 28, 28, 28, 28)), row.names = c(NA, -44L), class = c("tbl_df", 
"tbl", "data.frame"))

How can I create a new grp variable that divides row into groups of n unique row values each, within type? e.g. below, for n = 4, the first four subgroups of repeated -or unique- values of row are grp 1 (red), the next four subgroups of repeated -or unique- values of row are grp 2 (blue), and so on. The ideal would be a function allowing n to be changed as desired, but not necessarily.

Knowing that, within type, row is always in an ascending order but not necessarily continuously, and that the number of its repetitions can vary randomly.

Edit: in addition, to secure in the case where n is not an exact multiple of 4 for a given type (see the exchanges of the answers provided), grp would return NA for all this type, ideally.

Note: here is a small example, but my database has thousands of rows and types, with much more repetitions.

Initial and desired data:

Initial data:

dat0 <-
structure(list(type = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), row = c(5, 
5, 6, 8, 8, 8, 10, 11, 11, 11, 13, 13, 14, 14, 18, 18, 18, 3, 
4, 4, 4, 6, 6, 7, 7, 7, 9, 9, 10, 10, 10, 12, 16, 16, 21, 22, 
22, 22, 23, 23, 28, 28, 28, 28)), row.names = c(NA, -44L), class = c("tbl_df", 
"tbl", "data.frame"))

Share Improve this question edited Feb 13 at 6:01 asked Feb 11 at 18:02 denis 8025 silver badges14 bronze badges

To clarify, when you say "the multiple of 4 of the row repetitions remains constant" -- do you mean the number of unique row values in the data will always be a whole multiple of 4 within each type? – zephryl Commented Feb 11 at 19:45
Yes if the unique row values you are referring to are unique after grouping the repetitions. – denis Commented Feb 11 at 20:58

Add a comment |

3 Answers 3

Sorted by: Reset to default 1

library(dplyr)

dat0 %>% 
  distinct() %>% 
  mutate(id = gl(n()/4, 4)) %>% 
  right_join(dat0)

#> # A tibble: 44 × 3
#>    type    row id   
#>    <chr> <dbl> <fct>
#>  1 a         5 1    
#>  2 a         5 1    
#>  3 a         6 1    
#>  4 a         8 1    
#>  5 a         8 1    
#>  6 a         8 1    
#>  7 a        10 1    
#>  8 a        11 2    
#>  9 a        11 2    
#> 10 a        11 2    
#> # ℹ 34 more rows

^{Created on 2025-02-11 with reprex v2.1.1}

You could use factor and relevel them.

> f <- \(x, n=4) {
+   xs <- factor(x)
+   ln <- length(levels(xs))
+   stopifnot(!ln %% n)  ## stops if groups would be uneven
+   levels(xs) <- rep(seq_len(ln/n), each=n)
+   xs
+ }
> 
> dat0 |> transform(g=ave(row, type, FUN=f))
   type row g
1     a   5 1
2     a   5 1
3     a   6 1
4     a   8 1
5     a   8 1
6     a   8 1
7     a  10 1
8     a  11 2
9     a  11 2
10    a  11 2
11    a  13 2
12    a  13 2
13    a  14 2
14    a  14 2
15    a  18 2
16    a  18 2
17    a  18 2
18    b   3 1
19    b   4 1
20    b   4 1
21    b   4 1
22    b   6 1
23    b   6 1
24    b   7 1
25    b   7 1
26    b   7 1
27    b   9 2
28    b   9 2
29    b  10 2
30    b  10 2
31    b  10 2
32    b  12 2
33    b  16 2
34    b  16 2
35    b  21 3
36    b  22 3
37    b  22 3
38    b  22 3
39    b  23 3
40    b  23 3
41    b  28 3
42    b  28 3
43    b  28 3
44    b  28 3

This also works if data is in disorder:

> dat0[sample.int(nrow(dat0)), ] |> 
+   transform(g=ave(row, type, FUN=f)) |> 
+   sort_by(~list(type, row))
   type row g
18    a   5 1
27    a   5 1
10    a   6 1
26    a   8 1
36    a   8 1
44    a   8 1
17    a  10 1
14    a  11 2
15    a  11 2
38    a  11 2
8     a  13 2
12    a  13 2
1     a  14 2
43    a  14 2
16    a  18 2
30    a  18 2
41    a  18 2
29    b   3 1
13    b   4 1
19    b   4 1
23    b   4 1
35    b   6 1
37    b   6 1
2     b   7 1
28    b   7 1
40    b   7 1
7     b   9 2
25    b   9 2
4     b  10 2
9     b  10 2
22    b  10 2
31    b  12 2
21    b  16 2
32    b  16 2
33    b  21 3
3     b  22 3
20    b  22 3
24    b  22 3
6     b  23 3
39    b  23 3
5     b  28 3
11    b  28 3
34    b  28 3
42    b  28 3

To see if chosen n is suitable you could check if dividing length of unique rows by group give integer, sth like

> with(unique(dat0[c('type', 'row')]), tapply(row, type, length))/4
a b 
2 3 
> with(unique(dat0[c('type', 'row')]), tapply(row, type, length))/6
       a        b 
1.333333 2.000000

Using dplyr::consecutive_id() with floor division:

library(dplyr)

grp_n_vals <- function(x, n) (consecutive_id(x) + n - 1) %/% n

Although you specified that row is always in order within type, if you wanted to relax that assumption, you can sort your values and then "unsort" the result:

grp_n_vals <- function(x, n) {
  ord <- order(order(x))
  out <- (consecutive_id(sort(x)) + n - 1) %/% n
  out[ord]
}

Result using either version:

dat0 %>%
  mutate(grp = grp_n_vals(row, n = 4), .by = type) %>% 
  print(n = Inf)
# # A tibble: 44 × 3
#    type    row   grp
#    <chr> <dbl> <dbl>
#  1 a         5     1
#  2 a         5     1
#  3 a         6     1
#  4 a         8     1
#  5 a         8     1
#  6 a         8     1
#  7 a        10     1
#  8 a        11     2
#  9 a        11     2
# 10 a        11     2
# 11 a        13     2
# 12 a        13     2
# 13 a        14     2
# 14 a        14     2
# 15 a        18     2
# 16 a        18     2
# 17 a        18     2
# 18 b         3     1
# 19 b         4     1
# 20 b         4     1
# 21 b         4     1
# 22 b         6     1
# 23 b         6     1
# 24 b         7     1
# 25 b         7     1
# 26 b         7     1
# 27 b         9     2
# 28 b         9     2
# 29 b        10     2
# 30 b        10     2
# 31 b        10     2
# 32 b        12     2
# 33 b        16     2
# 34 b        16     2
# 35 b        21     3
# 36 b        22     3
# 37 b        22     3
# 38 b        22     3
# 39 b        23     3
# 40 b        23     3
# 41 b        28     3
# 42 b        28     3
# 43 b        28     3
# 44 b        28     3

本文标签： rHow to number groups of randomly repeated values into sets of n unique valuesStack Overflow

版权声明：本文标题：r - How to number groups of randomly repeated values into sets of n unique values? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741642582a2390003.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

r - How to number groups of randomly repeated values into sets of n unique values? - Stack Overflow

3 Answers 3

更多相关文章

r - How to number groups of randomly repeated values into sets of n unique values? - Stack Overflow

发表评论

推荐文章

jquery - Adding JavaScript to a WordPress website

javascript - How to start a basic WebRTC data channel? - Stack Overflow

javascript - how to send cookie and redirect at same time in response ? (serverless framework) - Stack Overflow

decode - Decoding a url-encoded windows-1251 (cp1251) string with JavaScript - Stack Overflow

Specify exact parent child relationship between two blocks

热门文章

javascript - Isotope fitColumns layout causes the container to go blank - Stack Overflow

javascript - AngularJS counter to count up to a target number - Stack Overflow

javascript - How to chunk an object into smaller objects - Stack Overflow

asp.net - javascript toggle showhide div using one control - Stack Overflow

javascript - Undefined when trying to get the upper bound of array? - Stack Overflow

amazon ec2 - my local system does not truct ElasticSearch Certificate - Stack Overflow

javascript - ES6 Promise Errors not bubbling up as expected - Stack Overflow

javascript - How to catch an event when the user starts typing? - Stack Overflow

javascript - Get times in 15 minute increments up to a specific time - Stack Overflow

javascript - Parse JSON Object that contains Json string - Stack Overflow

最新文章

Win7各正式版下载地址和SHA验证

怎么样把中文版的Windows7改成英文版的Windows7

Win7系统笔记本蓝牙打开指南：详细步骤助你轻松连接

win7开机弹计算机,win7开机弹出Windows Installer窗口的解决方法

windows7虚拟机安装vmtools方法

typescript - How to wait for HTML document to loadrender before printing it (plain JavaScript) - Stack Overflow

css - Need to make sidebar responsive only when screen width is greater than 1024px

Flutter appBar context - Stack Overflow

Javascript preload images for css background-image change - Stack Overflow

gzip - When should JavaScript NOT be gzipped? - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价