admin管理员组文章数量:1122846
Let's assume there is a group of 3 persons, for which I have a time series of when they start and finish an activity. An example dataframe would be:
library(tidyverse)
GrXX <- tibble(Individual = rep(c("A", "B", "C"), times = c(2, 3, 5)),
Frame_beginning = c(1, 16, 7, 21, 29, 3, 9, 12, 19, 27),
Frame_end = c(3, 22, 15, 24, 30, 7, 10, 12, 24, 30),
Duration = Frame_end - Frame_beginning + 1,
Frame_end_valid_group = 30)
This group can be represented by the following figure:
I would love to shuffle this group around, so that the time blocks within individuals are randomly placed, without overlap within individuals (there can be overlap between individuals). Additonally, all original time blocks should also be fully on the shuffled timeline (i.e., they should not start or finish outside of the original timeline). One potential shuffling could be the following:
My question is how to do that in R, in an efficient way. I started some code, but it seems very complex for the task, and does not prevent all overlaps. Here it is:
GrXX_shuffled <- tibble()
for (i in c("A", "B", "C")) {
temp <- GrXX %>%
filter(Individual == i) %>%
arrange(-Duration, Frame_beginning)
valid_frames <- c()
Frame_shuffled_start <- c()
Frame_shuffled_end <- c()
Frame_earliest <- 1
for (j in 1:nrow(temp)) {
temp2 <- temp[j, ]
if(j == 1) {
valid_frames <- 1:(temp2$Frame_end_valid_group - temp2$Duration + 1)
temp2 <- temp2 %>%
mutate(Frame_beginning_shuffled = sample(valid_frames, size = 1), .after = Frame_beginning) %>%
mutate(Frame_end_shuffled = Frame_beginning_shuffled + Duration - 1, .after = Frame_end)
GrXX_shuffled <- bind_rows(GrXX_shuffled, temp2)
}
if(j != 1) {
valid_frames <- c(Frame_earliest:(Frame_shuffled_start[j-1] - temp2$Duration - 1),
(Frame_shuffled_end[j-1] + 2):(temp2$Frame_end_valid_group - temp2$Duration + 1))
valid_frames <- valid_frames[valid_frames >= 1 & valid_frames <= 30]
temp2 <- temp2 %>%
mutate(Frame_beginning_shuffled = sample(valid_frames, size = 1), .after = Frame_beginning) %>%
mutate(Frame_end_shuffled = Frame_beginning_shuffled + Duration - 1, .after = Frame_end)
GrXX_shuffled <- bind_rows(GrXX_shuffled, temp2)
}
Frame_shuffled_start <- sort(c(Frame_shuffled_start, temp2$Frame_beginning_shuffled))
Frame_shuffled_end <- sort(c(Frame_shuffled_end, temp2$Frame_end_shuffled))
Frame_earliest <- min(valid_frames_2)
}
}
The idea behind the code is to:
- for each individual select the block with the longest duration and shuffle it around on the empty timeline (and it should not start too late so that the full block is still fully on the timeline)
- then go for the next longest block (if it exists) and place it in the rest of the timeline, without overlap with any previous already-shuffled blocks
- repeat until no more blocks have to be shuffled
Any idea how to solve this efficiently?
Many thanks in advance :-)
Let's assume there is a group of 3 persons, for which I have a time series of when they start and finish an activity. An example dataframe would be:
library(tidyverse)
GrXX <- tibble(Individual = rep(c("A", "B", "C"), times = c(2, 3, 5)),
Frame_beginning = c(1, 16, 7, 21, 29, 3, 9, 12, 19, 27),
Frame_end = c(3, 22, 15, 24, 30, 7, 10, 12, 24, 30),
Duration = Frame_end - Frame_beginning + 1,
Frame_end_valid_group = 30)
This group can be represented by the following figure:
I would love to shuffle this group around, so that the time blocks within individuals are randomly placed, without overlap within individuals (there can be overlap between individuals). Additonally, all original time blocks should also be fully on the shuffled timeline (i.e., they should not start or finish outside of the original timeline). One potential shuffling could be the following:
My question is how to do that in R, in an efficient way. I started some code, but it seems very complex for the task, and does not prevent all overlaps. Here it is:
GrXX_shuffled <- tibble()
for (i in c("A", "B", "C")) {
temp <- GrXX %>%
filter(Individual == i) %>%
arrange(-Duration, Frame_beginning)
valid_frames <- c()
Frame_shuffled_start <- c()
Frame_shuffled_end <- c()
Frame_earliest <- 1
for (j in 1:nrow(temp)) {
temp2 <- temp[j, ]
if(j == 1) {
valid_frames <- 1:(temp2$Frame_end_valid_group - temp2$Duration + 1)
temp2 <- temp2 %>%
mutate(Frame_beginning_shuffled = sample(valid_frames, size = 1), .after = Frame_beginning) %>%
mutate(Frame_end_shuffled = Frame_beginning_shuffled + Duration - 1, .after = Frame_end)
GrXX_shuffled <- bind_rows(GrXX_shuffled, temp2)
}
if(j != 1) {
valid_frames <- c(Frame_earliest:(Frame_shuffled_start[j-1] - temp2$Duration - 1),
(Frame_shuffled_end[j-1] + 2):(temp2$Frame_end_valid_group - temp2$Duration + 1))
valid_frames <- valid_frames[valid_frames >= 1 & valid_frames <= 30]
temp2 <- temp2 %>%
mutate(Frame_beginning_shuffled = sample(valid_frames, size = 1), .after = Frame_beginning) %>%
mutate(Frame_end_shuffled = Frame_beginning_shuffled + Duration - 1, .after = Frame_end)
GrXX_shuffled <- bind_rows(GrXX_shuffled, temp2)
}
Frame_shuffled_start <- sort(c(Frame_shuffled_start, temp2$Frame_beginning_shuffled))
Frame_shuffled_end <- sort(c(Frame_shuffled_end, temp2$Frame_end_shuffled))
Frame_earliest <- min(valid_frames_2)
}
}
The idea behind the code is to:
- for each individual select the block with the longest duration and shuffle it around on the empty timeline (and it should not start too late so that the full block is still fully on the timeline)
- then go for the next longest block (if it exists) and place it in the rest of the timeline, without overlap with any previous already-shuffled blocks
- repeat until no more blocks have to be shuffled
Any idea how to solve this efficiently?
Many thanks in advance :-)
Share Improve this question asked Nov 22, 2024 at 18:16 KrisAnathemaKrisAnathema 13110 bronze badges 3- Curious as well as how you did your plotting, while not to your question as answer below, also useful to add so others can reproduce your results. – Chris Commented Nov 23, 2024 at 1:29
- @Chris Well, the plotting was done manually in PowerPoint... – KrisAnathema Commented Nov 23, 2024 at 21:34
- It;s a pity, ... – Chris Commented Nov 23, 2024 at 23:12
1 Answer
Reset to default 3Here's an efficient function to do what was described. It assumes discrete timesteps as in the OP's example.
library(data.table)
library(RcppAlgos) # for `compositionsSample`
library(Rfast) # for `colShuffle` and `colCumSums`
shuffle <- function(dt, n = 1L) {
f <- function(v, m) {
k <- length(v)
k1 <- k + 1L
as.data.table(
cbind(
rep = rep(1:n, each = k),
matrix(
colCumSums(
`dim<-`(
rbind(
`dim<-`(t(compositionsSample(0:(m - sum(v)), k + 1, TRUE,
n = n)[,-k - 1]), NULL),
`dim<-`(colShuffle(matrix(v, k, n)), NULL)
), c(2*k, n)
)
) + 1:0,
n*k, 2, TRUE, list(NULL, c("Frame_beginning", "Frame_end"))
)
)
)[,`:=`(Duration = Frame_end - Frame_beginning + 1,
Frame_end_valid_group = m)]
}
setorder(
setcolorder(dt[,f(Duration, Frame_end_valid_group[1]), Individual], "rep"),
rep, Individual
)
}
Demonstrating:
shuffle(setDT(GrXX))[]
#> rep Individual Frame_beginning Frame_end Duration Frame_end_valid_group
#> <num> <char> <num> <num> <num> <num>
#> 1: 1 A 3 9 7 30
#> 2: 1 A 27 29 3 30
#> 3: 1 B 2 3 2 30
#> 4: 1 B 5 8 4 30
#> 5: 1 B 21 29 9 30
#> 6: 1 C 1 5 5 30
#> 7: 1 C 8 8 1 30
#> 8: 1 C 12 13 2 30
#> 9: 1 C 17 22 6 30
#> 10: 1 C 26 29 4 30
It can also do multiple shuffles using the n
argument:
shuffle(GrXX, 3)[]
#> rep Individual Frame_beginning Frame_end Duration Frame_end_valid_group
#> <num> <char> <num> <num> <num> <num>
#> 1: 1 A 17 23 7 30
#> 2: 1 A 25 27 3 30
#> 3: 1 B 5 8 4 30
#> 4: 1 B 16 17 2 30
#> 5: 1 B 19 27 9 30
#> 6: 1 C 4 4 1 30
#> 7: 1 C 6 9 4 30
#> 8: 1 C 11 12 2 30
#> 9: 1 C 15 20 6 30
#> 10: 1 C 25 29 5 30
#> 11: 2 A 8 10 3 30
#> 12: 2 A 19 25 7 30
#> 13: 2 B 5 6 2 30
#> 14: 2 B 8 16 9 30
#> 15: 2 B 20 23 4 30
#> 16: 2 C 3 7 5 30
#> 17: 2 C 11 12 2 30
#> 18: 2 C 14 17 4 30
#> 19: 2 C 19 19 1 30
#> 20: 2 C 21 26 6 30
#> 21: 3 A 12 14 3 30
#> 22: 3 A 23 29 7 30
#> 23: 3 B 4 5 2 30
#> 24: 3 B 10 13 4 30
#> 25: 3 B 15 23 9 30
#> 26: 3 C 2 7 6 30
#> 27: 3 C 9 10 2 30
#> 28: 3 C 12 16 5 30
#> 29: 3 C 19 19 1 30
#> 30: 3 C 23 26 4 30
And it's fast:
microbenchmark::microbenchmark(shuffle(GrXX, 100))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> shuffle(GrXX, 100) 1.9616 2.1486 2.343154 2.22235 2.36755 5.8275 100
本文标签: How can I shuffle time blocks in time series without overlap in RStack Overflow
版权声明:本文标题:How can I shuffle time blocks in time series without overlap in R? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736301668a1931256.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论