admin管理员组文章数量:1345031
I want to calculate the difference in time of vehicles passing camera 1 (at the entrance) and camera 2 (at the exit). The problem is often data from either camera is missing.
I tried to pivot the data to a wide format. This worked with seq_len()
to create sets, but not when data are missing. Then the function shifted and grouped the wrong times. (In the real data, time is a posix object, but I don't think that matters for the example.)
Here is a short example of the data. As you can see, the camera of observation eight is 1 instead of 2, and hereafter the sequence lags behind.
How can I get this to work? Is there a way seq_len can continue, or another function that works? I tried consecutive_id()
and group_indices()
, but they work the wrong way around. Another idea I could not get to work was adding rows with missing data so camera 1 and 2 keep alternating.
licence <- as.factor(c("A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C", "D"))
camera <- as.factor(c(1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2))
time <- 1:12
camera_data <- data.frame(licence, camera, time)
licence camera time
1 A 1 1
2 A 2 2
3 A 1 3
4 A 2 4
5 B 1 5
6 B 2 6
7 C 1 7
8 C 1 8
9 C 2 9
10 C 1 10
11 C 2 11
12 D 2 12
13 D 1 13
14 D 2 14
camera_data_sequence <- camera_data |>
dplyr::group_by(camera) |>
dplyr::arrange(licence, time) |>
dplyr::mutate(sequence = seq_len(dplyr::n()))
licence camera time sequence
<fct> <fct> <int> <int>
1 A 1 1 1
2 A 2 2 1
3 A 1 3 2
4 A 2 4 2
5 B 1 5 3
6 B 2 6 3
7 C 1 7 4
8 C 1 8 5
9 C 2 9 4 # this should be group 5
10 C 1 10 6
11 C 2 11 5 # this should be group 6
12 D 2 12 6 # this should be group 7
13 D 1 13 7 # this should be group 8
14 D 2 14 7 # this should be group 8
camera_data_sequence2 <- camera_data_sequence |>
tidyr::pivot_wider(names_from = camera,
values_from = time)
licence sequence `1` `2`
<fct> <int> <int> <int>
1 A 1 1 2
2 A 2 3 4
3 B 3 5 6
4 C 4 7 9
5 C 5 8 11
6 C 6 10 NA
7 D 6 NA 12
8 D 7 13 14
I want to calculate the difference in time of vehicles passing camera 1 (at the entrance) and camera 2 (at the exit). The problem is often data from either camera is missing.
I tried to pivot the data to a wide format. This worked with seq_len()
to create sets, but not when data are missing. Then the function shifted and grouped the wrong times. (In the real data, time is a posix object, but I don't think that matters for the example.)
Here is a short example of the data. As you can see, the camera of observation eight is 1 instead of 2, and hereafter the sequence lags behind.
How can I get this to work? Is there a way seq_len can continue, or another function that works? I tried consecutive_id()
and group_indices()
, but they work the wrong way around. Another idea I could not get to work was adding rows with missing data so camera 1 and 2 keep alternating.
licence <- as.factor(c("A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C", "D"))
camera <- as.factor(c(1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2))
time <- 1:12
camera_data <- data.frame(licence, camera, time)
licence camera time
1 A 1 1
2 A 2 2
3 A 1 3
4 A 2 4
5 B 1 5
6 B 2 6
7 C 1 7
8 C 1 8
9 C 2 9
10 C 1 10
11 C 2 11
12 D 2 12
13 D 1 13
14 D 2 14
camera_data_sequence <- camera_data |>
dplyr::group_by(camera) |>
dplyr::arrange(licence, time) |>
dplyr::mutate(sequence = seq_len(dplyr::n()))
licence camera time sequence
<fct> <fct> <int> <int>
1 A 1 1 1
2 A 2 2 1
3 A 1 3 2
4 A 2 4 2
5 B 1 5 3
6 B 2 6 3
7 C 1 7 4
8 C 1 8 5
9 C 2 9 4 # this should be group 5
10 C 1 10 6
11 C 2 11 5 # this should be group 6
12 D 2 12 6 # this should be group 7
13 D 1 13 7 # this should be group 8
14 D 2 14 7 # this should be group 8
camera_data_sequence2 <- camera_data_sequence |>
tidyr::pivot_wider(names_from = camera,
values_from = time)
licence sequence `1` `2`
<fct> <int> <int> <int>
1 A 1 1 2
2 A 2 3 4
3 B 3 5 6
4 C 4 7 9
5 C 5 8 11
6 C 6 10 NA
7 D 6 NA 12
8 D 7 13 14
Share
Improve this question
asked yesterday
Raoul Van OostenRaoul Van Oosten
2421 silver badge7 bronze badges
2
|
4 Answers
Reset to default 1licence <- as.factor(c("A", "A", "A", "A", "B", "B",
"C", "C", "C", "C", "C", "D", "D", "D"))
camera <- as.factor(c(1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 2))
time <- 1:14
camera_data <- data.frame(licence, camera, time)
library(dplyr)
library(tidyr)
## doing a pivot_wider, but you could summarize/reframe as well
camera_data %>%
mutate(sequence = cumsum(as.numeric(camera) -
lag(as.numeric(camera), default = 1) < 1)) %>%
pivot_wider(names_from = "camera", values_from = time)
#> # A tibble: 8 × 4
#> licence sequence `1` `2`
#> <fct> <int> <int> <int>
#> 1 A 1 1 2
#> 2 A 2 3 4
#> 3 B 3 5 6
#> 4 C 4 7 NA
#> 5 C 5 8 9
#> 6 C 6 10 11
#> 7 D 7 NA 12
#> 8 D 8 13 14
Created on 2025-04-03 with reprex v2.1.1
library(dplyr)
library(tidyr)
mutate(camera_data, id=row_number()) %>%
complete(licence, camera) %>%
arrange(id) %>%
mutate(sequence=cumsum(camera==1)) # You may need .by=licence
Gives:
# A tibble: 13 × 5
licence camera time id sequence
<fct> <fct> <int> <int> <int>
1 A 1 1 1 1
2 A 2 2 2 1
3 A 1 3 3 2
4 A 2 4 4 2
5 B 1 5 5 3
6 B 2 6 6 3
7 C 1 7 7 4
8 C 1 8 8 5
9 C 2 9 9 5
10 C 1 10 10 6
11 C 2 11 11 6
12 D 2 12 12 6
13 D 1 NA NA 7
Then use summarise or reframe to get the time differences
... %>%
reframe(time.diff=nth(time,2) - first(time), .by=c(licence, sequence))
# A tibble: 8 × 3
licence sequence time.diff
<fct> <int> <int>
1 A 1 1
2 A 2 1
3 B 3 1
4 C 4 NA
5 C 5 1
6 C 6 1
7 D 6 NA
8 D 7 1
Within each license
, use cumsum()
and lag()
to increment sequence
when the current row is for camera 1, the previous row is for camera 2, or it’s the first row.
library(dplyr)
camera_data <- camera_data |>
mutate(
sequence = cumsum(camera == 1 | lag(camera) == 2 | is.na(lag(camera))),
.by = licence
)
licence camera time sequence
1 A 1 1 1
2 A 2 2 1
3 A 1 3 2
4 A 2 4 2
5 B 1 5 1
6 B 2 6 1
7 C 1 7 1
8 C 1 8 2
9 C 2 9 2
10 C 1 10 3
11 C 2 11 3
12 D 2 12 1
13 D 1 13 2
14 D 2 14 2
Then to get time differences:
camera_data |>
summarize(
diff = if (n() == 2) diff(time) else NA,
.by = c(licence, sequence)
)
licence sequence diff
1 A 1 1
2 A 2 1
3 B 1 1
4 C 1 NA
5 C 2 1
6 C 3 1
7 D 1 NA
8 D 2 1
Or, for only sequences with both cameras:
camera_data |>
reframe(diff = diff(time), .by = c(licence, sequence))
licence sequence diff
1 A 1 1
2 A 2 1
3 B 1 1
4 C 2 1
5 C 3 1
6 D 2 1
Try this, which produces balanced data,
> (res <- split(cdat, ~license) |>
+ lapply(transform, event=cumsum(diff(c(camera[1], camera)) != 1)) |>
+ do.call(what='rbind') |> as.data.frame() |>
+ {\(.) merge(.,
+ expand.grid(camera=unique(.$camera),
+ license=unique(.$license),
+ event=unique(.$event)),
+ all=TRUE)}() |>
+ sort_by(~list(license, event)))
license camera event time
1 A 1 1 1
4 A 2 1 2
2 A 1 2 3
5 A 2 2 4
3 A 1 3 NA
6 A 2 3 NA
7 B 1 1 5
10 B 2 1 6
8 B 1 2 NA
11 B 2 2 NA
9 B 1 3 NA
12 B 2 3 NA
13 C 1 1 7
16 C 2 1 NA
14 C 1 2 8
17 C 2 2 9
15 C 1 3 10
18 C 2 3 11
19 D 1 1 NA
22 D 2 1 12
20 D 1 2 13
23 D 2 2 14
21 D 1 3 NA
24 D 2 3 15
and reshape
thereafter.
> reshape(res, direction='wide', idvar=c('license', 'event'), timevar='camera')
license event time.1 time.2
1 A 1 1 2
2 A 2 3 4
3 A 3 NA NA
7 B 1 5 6
8 B 2 NA NA
9 B 3 NA NA
13 C 1 7 NA
14 C 2 8 9
15 C 3 10 11
19 D 1 NA 12
20 D 2 13 14
21 D 3 NA 15
Data:
> dput(cdat)
structure(list(license = c("A", "A", "A", "A", "B", "B", "C",
"C", "C", "C", "C", "D", "D", "D", "D"), camera = c(1L, 2L, 1L,
2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), time = 1:15), class = "data.frame", row.names = c(NA,
-15L))
本文标签: rAdd group ID per set of two observationsStack Overflow
版权声明:本文标题:r - Add group ID per set of two observations - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743784394a2538415.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
licence <-
has just one"G"
but the dataframe you show has three.) – zephryl Commented yesterday