admin管理员组文章数量:1296918
How can I create a new grp
variable that divides row
into groups of n unique row
values each, within type
? e.g. below, for n = 4, the first four subgroups of repeated -or unique- values of row
are grp
1 (red), the next four subgroups of repeated -or unique- values of row
are grp
2 (blue), and so on. The ideal would be a function allowing n to be changed as desired, but not necessarily.
Knowing that, within type
, row
is always in an ascending order but not necessarily continuously, and that the number of its repetitions can vary randomly.
Edit: in addition, to secure in the case where n is not an exact multiple of 4 for a given type (see the exchanges of the answers provided), grp would return NA
for all this type, ideally.
Note: here is a small example, but my database has thousands of rows and types, with much more repetitions.
Initial and desired data:
Initial data:
dat0 <-
structure(list(type = c("a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), row = c(5,
5, 6, 8, 8, 8, 10, 11, 11, 11, 13, 13, 14, 14, 18, 18, 18, 3,
4, 4, 4, 6, 6, 7, 7, 7, 9, 9, 10, 10, 10, 12, 16, 16, 21, 22,
22, 22, 23, 23, 28, 28, 28, 28)), row.names = c(NA, -44L), class = c("tbl_df",
"tbl", "data.frame"))
How can I create a new grp
variable that divides row
into groups of n unique row
values each, within type
? e.g. below, for n = 4, the first four subgroups of repeated -or unique- values of row
are grp
1 (red), the next four subgroups of repeated -or unique- values of row
are grp
2 (blue), and so on. The ideal would be a function allowing n to be changed as desired, but not necessarily.
Knowing that, within type
, row
is always in an ascending order but not necessarily continuously, and that the number of its repetitions can vary randomly.
Edit: in addition, to secure in the case where n is not an exact multiple of 4 for a given type (see the exchanges of the answers provided), grp would return NA
for all this type, ideally.
Note: here is a small example, but my database has thousands of rows and types, with much more repetitions.
Initial and desired data:
Initial data:
dat0 <-
structure(list(type = c("a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), row = c(5,
5, 6, 8, 8, 8, 10, 11, 11, 11, 13, 13, 14, 14, 18, 18, 18, 3,
4, 4, 4, 6, 6, 7, 7, 7, 9, 9, 10, 10, 10, 12, 16, 16, 21, 22,
22, 22, 23, 23, 28, 28, 28, 28)), row.names = c(NA, -44L), class = c("tbl_df",
"tbl", "data.frame"))
Share
Improve this question
edited Feb 13 at 6:01
denis
asked Feb 11 at 18:02
denisdenis
8025 silver badges14 bronze badges
2
|
3 Answers
Reset to default 1library(dplyr)
dat0 %>%
distinct() %>%
mutate(id = gl(n()/4, 4)) %>%
right_join(dat0)
#> # A tibble: 44 × 3
#> type row id
#> <chr> <dbl> <fct>
#> 1 a 5 1
#> 2 a 5 1
#> 3 a 6 1
#> 4 a 8 1
#> 5 a 8 1
#> 6 a 8 1
#> 7 a 10 1
#> 8 a 11 2
#> 9 a 11 2
#> 10 a 11 2
#> # ℹ 34 more rows
Created on 2025-02-11 with reprex v2.1.1
You could use factor
and relevel
them.
> f <- \(x, n=4) {
+ xs <- factor(x)
+ ln <- length(levels(xs))
+ stopifnot(!ln %% n) ## stops if groups would be uneven
+ levels(xs) <- rep(seq_len(ln/n), each=n)
+ xs
+ }
>
> dat0 |> transform(g=ave(row, type, FUN=f))
type row g
1 a 5 1
2 a 5 1
3 a 6 1
4 a 8 1
5 a 8 1
6 a 8 1
7 a 10 1
8 a 11 2
9 a 11 2
10 a 11 2
11 a 13 2
12 a 13 2
13 a 14 2
14 a 14 2
15 a 18 2
16 a 18 2
17 a 18 2
18 b 3 1
19 b 4 1
20 b 4 1
21 b 4 1
22 b 6 1
23 b 6 1
24 b 7 1
25 b 7 1
26 b 7 1
27 b 9 2
28 b 9 2
29 b 10 2
30 b 10 2
31 b 10 2
32 b 12 2
33 b 16 2
34 b 16 2
35 b 21 3
36 b 22 3
37 b 22 3
38 b 22 3
39 b 23 3
40 b 23 3
41 b 28 3
42 b 28 3
43 b 28 3
44 b 28 3
This also works if data is in disorder:
> dat0[sample.int(nrow(dat0)), ] |>
+ transform(g=ave(row, type, FUN=f)) |>
+ sort_by(~list(type, row))
type row g
18 a 5 1
27 a 5 1
10 a 6 1
26 a 8 1
36 a 8 1
44 a 8 1
17 a 10 1
14 a 11 2
15 a 11 2
38 a 11 2
8 a 13 2
12 a 13 2
1 a 14 2
43 a 14 2
16 a 18 2
30 a 18 2
41 a 18 2
29 b 3 1
13 b 4 1
19 b 4 1
23 b 4 1
35 b 6 1
37 b 6 1
2 b 7 1
28 b 7 1
40 b 7 1
7 b 9 2
25 b 9 2
4 b 10 2
9 b 10 2
22 b 10 2
31 b 12 2
21 b 16 2
32 b 16 2
33 b 21 3
3 b 22 3
20 b 22 3
24 b 22 3
6 b 23 3
39 b 23 3
5 b 28 3
11 b 28 3
34 b 28 3
42 b 28 3
To see if chosen n
is suitable you could check if dividing length
of unique
rows by group give integer, sth like
> with(unique(dat0[c('type', 'row')]), tapply(row, type, length))/4
a b
2 3
> with(unique(dat0[c('type', 'row')]), tapply(row, type, length))/6
a b
1.333333 2.000000
Using dplyr::consecutive_id()
with floor division:
library(dplyr)
grp_n_vals <- function(x, n) (consecutive_id(x) + n - 1) %/% n
Although you specified that row
is always in order within type
, if you wanted to relax that assumption, you can sort your values and then "unsort" the result:
grp_n_vals <- function(x, n) {
ord <- order(order(x))
out <- (consecutive_id(sort(x)) + n - 1) %/% n
out[ord]
}
Result using either version:
dat0 %>%
mutate(grp = grp_n_vals(row, n = 4), .by = type) %>%
print(n = Inf)
# # A tibble: 44 × 3
# type row grp
# <chr> <dbl> <dbl>
# 1 a 5 1
# 2 a 5 1
# 3 a 6 1
# 4 a 8 1
# 5 a 8 1
# 6 a 8 1
# 7 a 10 1
# 8 a 11 2
# 9 a 11 2
# 10 a 11 2
# 11 a 13 2
# 12 a 13 2
# 13 a 14 2
# 14 a 14 2
# 15 a 18 2
# 16 a 18 2
# 17 a 18 2
# 18 b 3 1
# 19 b 4 1
# 20 b 4 1
# 21 b 4 1
# 22 b 6 1
# 23 b 6 1
# 24 b 7 1
# 25 b 7 1
# 26 b 7 1
# 27 b 9 2
# 28 b 9 2
# 29 b 10 2
# 30 b 10 2
# 31 b 10 2
# 32 b 12 2
# 33 b 16 2
# 34 b 16 2
# 35 b 21 3
# 36 b 22 3
# 37 b 22 3
# 38 b 22 3
# 39 b 23 3
# 40 b 23 3
# 41 b 28 3
# 42 b 28 3
# 43 b 28 3
# 44 b 28 3
本文标签: rHow to number groups of randomly repeated values into sets of n unique valuesStack Overflow
版权声明:本文标题:r - How to number groups of randomly repeated values into sets of n unique values? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741642582a2390003.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
row
values in the data will always be a whole multiple of 4 within eachtype
? – zephryl Commented Feb 11 at 19:45