Using R to generate data based on conditions from multiple rows of another column - Stack Overflow-软件玩家

admin管理员组
文章数量:1122832

I have the following data of timestamps and indications of high-risk events. Based on several conditions described below, I want to create a new column stop indicating whether operation should be stopped based on the high-risk conditions. The data look like this.

timestamp       high_risk
9/26/2024 0:00  0
9/26/2024 0:01  1
9/26/2024 0:02  0
9/26/2024 0:03  0
9/26/2024 0:04  0
9/26/2024 0:05  0
9/26/2024 0:06  0
9/26/2024 0:07  1
9/26/2024 0:08  0
9/26/2024 0:09  0
9/26/2024 0:10  0
9/26/2024 0:11  1

My conditions are the following:

If running (stop = 0) and there is a high-risk condition, stop for 5 minutes. The minute where the high-risk event occurs is included in the 5-minute stop period (minutes 1-5).
After the 5 minutes, start again if no high-risk conditions in the last 2 minutes
If operation is currently stopped, and there is a high-risk condition in the last 2 minutes, extend the stop by 2 minutes from when the high risk occurred. The +2 minutes is added after the high risk event, but does not include the high risk event. For example, in minute 11 there is a high-risk event, but this extends the stop by 2 minutes beyond minute 11. So operation is stopped in minutes 12 and 13 as well (the +2 extension does not include minute 11). This can also be seen at minute 31, with minutes 32 and 33 stopped. This is slightly different when it is operating and stops for 5 minutes (see first bullet point).

I need to generate a new column named stop indicating whether the high-risk events of stopped operation based on the criteria above. The output data should look like the following:

timestamp       high_risk   stop
9/26/2024 0:00      0       0   # no high risk so not stopped
9/26/2024 0:01      1       1   # high risk condition so stopped for the next 5 minutes
9/26/2024 0:02      0       1
9/26/2024 0:03      0       1
9/26/2024 0:04      0       1
9/26/2024 0:05      0       1
9/26/2024 0:06      0       0   # no high-risk condition in the last 2 minutes; not stopped
9/26/2024 0:07      1       1   # high risk condition so stop for the next 5 minutes
9/26/2024 0:08      0       1
9/26/2024 0:09      0       1
9/26/2024 0:10      0       1  
9/26/2024 0:11      1       1   # high-risk condition while stopped in the last 2 minutes so extend stop by 2 minutes
9/26/2024 0:12      0       1
9/26/2024 0:13      0       1
9/26/2024 0:14      0       0   
9/26/2024 0:15      0       0
9/26/2024 0:16      1       1   # high risk condition so stop for the next 5 minutes 
9/26/2024 0:17      1       1   # second high risk condition not in the last 2 minute so no stop extension
9/26/2024 0:18      0       1
9/26/2024 0:19      0       1
9/26/2024 0:20      0       1
9/26/2024 0:21      0       0
9/26/2024 0:22      1       1  # high risk condition so stop for the next 5 minutes
9/26/2024 0:23      0       1
9/26/2024 0:24      1       1  # second high risk condition not in the last 2 minute so no stop extension
9/26/2024 0:25      0       1
9/26/2024 0:26      0       1
9/26/2024 0:27      0       0
9/26/2024 0:28      1       1  # high risk condition so stop for the next 5 minutes
9/26/2024 0:29      0       1
9/26/2024 0:30      0       1
9/26/2024 0:31      1       1  # high-risk condition while stopped in the last 2 minutes so extend stop by 2 minutes
9/26/2024 0:32      0       1
9/26/2024 0:33      0       1
9/26/2024 0:34      0       0

I not sure how to begin to code this with R because I've never had to generate data based on conditions like this. Thanks for helping.

timestamp       high_risk
9/26/2024 0:00  0
9/26/2024 0:01  1
9/26/2024 0:02  0
9/26/2024 0:03  0
9/26/2024 0:04  0
9/26/2024 0:05  0
9/26/2024 0:06  0
9/26/2024 0:07  1
9/26/2024 0:08  0
9/26/2024 0:09  0
9/26/2024 0:10  0
9/26/2024 0:11  1

My conditions are the following:

If running (stop = 0) and there is a high-risk condition, stop for 5 minutes. The minute where the high-risk event occurs is included in the 5-minute stop period (minutes 1-5).
After the 5 minutes, start again if no high-risk conditions in the last 2 minutes
If operation is currently stopped, and there is a high-risk condition in the last 2 minutes, extend the stop by 2 minutes from when the high risk occurred. The +2 minutes is added after the high risk event, but does not include the high risk event. For example, in minute 11 there is a high-risk event, but this extends the stop by 2 minutes beyond minute 11. So operation is stopped in minutes 12 and 13 as well (the +2 extension does not include minute 11). This can also be seen at minute 31, with minutes 32 and 33 stopped. This is slightly different when it is operating and stops for 5 minutes (see first bullet point).

I need to generate a new column named stop indicating whether the high-risk events of stopped operation based on the criteria above. The output data should look like the following:

timestamp       high_risk   stop
9/26/2024 0:00      0       0   # no high risk so not stopped
9/26/2024 0:01      1       1   # high risk condition so stopped for the next 5 minutes
9/26/2024 0:02      0       1
9/26/2024 0:03      0       1
9/26/2024 0:04      0       1
9/26/2024 0:05      0       1
9/26/2024 0:06      0       0   # no high-risk condition in the last 2 minutes; not stopped
9/26/2024 0:07      1       1   # high risk condition so stop for the next 5 minutes
9/26/2024 0:08      0       1
9/26/2024 0:09      0       1
9/26/2024 0:10      0       1  
9/26/2024 0:11      1       1   # high-risk condition while stopped in the last 2 minutes so extend stop by 2 minutes
9/26/2024 0:12      0       1
9/26/2024 0:13      0       1
9/26/2024 0:14      0       0   
9/26/2024 0:15      0       0
9/26/2024 0:16      1       1   # high risk condition so stop for the next 5 minutes 
9/26/2024 0:17      1       1   # second high risk condition not in the last 2 minute so no stop extension
9/26/2024 0:18      0       1
9/26/2024 0:19      0       1
9/26/2024 0:20      0       1
9/26/2024 0:21      0       0
9/26/2024 0:22      1       1  # high risk condition so stop for the next 5 minutes
9/26/2024 0:23      0       1
9/26/2024 0:24      1       1  # second high risk condition not in the last 2 minute so no stop extension
9/26/2024 0:25      0       1
9/26/2024 0:26      0       1
9/26/2024 0:27      0       0
9/26/2024 0:28      1       1  # high risk condition so stop for the next 5 minutes
9/26/2024 0:29      0       1
9/26/2024 0:30      0       1
9/26/2024 0:31      1       1  # high-risk condition while stopped in the last 2 minutes so extend stop by 2 minutes
9/26/2024 0:32      0       1
9/26/2024 0:33      0       1
9/26/2024 0:34      0       0

I not sure how to begin to code this with R because I've never had to generate data based on conditions like this. Thanks for helping.

Share Improve this question edited Nov 22, 2024 at 22:07 asked Nov 21, 2024 at 19:41 GForce 1052 silver badges11 bronze badges

It is a little confusing as it appears there should be two processes, either of which could signal high risk...The third [1 <-> 1] pair seems to suggest this, as it should be stop, yet is high risk, so which process signaled? – Chris Commented Nov 21, 2024 at 19:54
It the operation is running, then it stops for 5 minutes during high risk (e.g., minutes 1-5). If it's already stopped, then the stop period is extended by 2 minutes past when the last high-risk activity was recorded (minutes 11-13). There is no limit to the number of times this can happen. The reason the third [1 <-> 1] pair is stopped longer than 5 minutes is because of the high-risk event that occurred at minute 11 which extended the stop period to minutes 12, and 13. Does that help? – GForce Commented Nov 21, 2024 at 21:31
If stopped (and stopped immediately for another 5 units from stop), then restart is suspended between stopped and stopped +5, and there can be no surreptitious restart of high _risk between stopped and stopped +5 in the range of stopped +5 -1:or -2, because the process is stopped. How would +2 min of stop then be added in the backtracking condition of within -2 from high-risk end (i.e, high_risk +5 -2) even exist unless stopped doesn't mean actually process stop, but something else. On a single process. And do you see this as post hoc or contemporaneous. – Chris Commented Nov 22, 2024 at 5:47
The +2 minutes is added after the high risk event, but does not include the high risk event. For example, in minute 11 there is a high-risk event, but this extends the stop by 2 minutes beyond minute 11. So operation is stopped in minutes 12 and 13 as well (the +2 extension does not include minute 11). This can also be seen at minute 31, with minutes 32 and 33 stopped. This is slightly different when it is operating and stops for 5 minutes. The minute where the high-risk event occurs is included in the 5-minute stop period (minutes 1-5). – GForce Commented Nov 22, 2024 at 17:51
Better that these discrete process descriptions make their way to above for clarity. – Chris Commented Nov 22, 2024 at 19:06

| Show 1 more comment

1 Answer 1

Sorted by: Reset to default 1 +50

this is not very optimized but I believe it matches with the desired answer.

library(data.table)
library(lubridate)

# current time
time <- round.POSIXt(Sys.time(), "mins")

# vector aligning with posted question
high_risk <- c(
    0,
    1,
    rep(0, 5),
    1,
    rep(0, 3),
    1,
    rep(0, 4),
    rep(1, 2),
    rep(0, 4),
    1,
    0,
    1,
    rep(0, 3),
    1,
    rep(0, 2),
    1,
    rep(0, 3)
)

# create data.table
dt <- data.table::data.table(
    timestamp = seq(time, time + lubridate::minutes(length(high_risk) - 1), by = "mins"),
    high_risk = high_risk
)

# add index
dt[, index := .I]

# initialize stop col
dt[, stop := NA_integer_]

The below function takes an index and checks for the logic you wrote above. It doesn't use the timestamp column, just the index column, but it could be modified to do so if desired.

f <- function(i) {
    # subset
    dt_subset <- dt[index == i]

    # is high risk?
    risk <- dt_subset[, high_risk == 1]

    if (risk) {
        # are we currently stopped?
        st <- dt[index == i, stop == 1]

        # if currently stopped, +2,
        # otherwise +5
        if (st && !is.na(st)) {
            forward <- i:(i + 2)
            forward <- forward[forward %in% dt$index]
        } else {
            forward <- i:(i + 4)
            forward <- forward[forward %in% dt$index]
        }

        dt[index %in% forward, c("stop") := 1]
    } else if (!risk && is.na(dt_subset[, stop])) {
        dt[index == i, c("stop") := 0]
    }
}

# run the function in a loop since current answer depends on previous index
# results
for (i in dt$index) {
    f(i)
}

# remove index column
dt[, index := NULL]

print(dt)

This results in

              timestamp high_risk  stop
                 <POSc>     <num> <int>
 1: 2024-11-25 16:52:00         0     0
 2: 2024-11-25 16:53:00         1     1
 3: 2024-11-25 16:54:00         0     1
 4: 2024-11-25 16:55:00         0     1
 5: 2024-11-25 16:56:00         0     1
 6: 2024-11-25 16:57:00         0     1
 7: 2024-11-25 16:58:00         0     0
 8: 2024-11-25 16:59:00         1     1
 9: 2024-11-25 17:00:00         0     1
10: 2024-11-25 17:01:00         0     1
11: 2024-11-25 17:02:00         0     1
12: 2024-11-25 17:03:00         1     1
13: 2024-11-25 17:04:00         0     1
14: 2024-11-25 17:05:00         0     1
15: 2024-11-25 17:06:00         0     0
16: 2024-11-25 17:07:00         0     0
17: 2024-11-25 17:08:00         1     1
18: 2024-11-25 17:09:00         1     1
19: 2024-11-25 17:10:00         0     1
20: 2024-11-25 17:11:00         0     1
21: 2024-11-25 17:12:00         0     1
22: 2024-11-25 17:13:00         0     0
23: 2024-11-25 17:14:00         1     1
24: 2024-11-25 17:15:00         0     1
25: 2024-11-25 17:16:00         1     1
26: 2024-11-25 17:17:00         0     1
27: 2024-11-25 17:18:00         0     1
28: 2024-11-25 17:19:00         0     0
29: 2024-11-25 17:20:00         1     1
30: 2024-11-25 17:21:00         0     1
31: 2024-11-25 17:22:00         0     1
32: 2024-11-25 17:23:00         1     1
33: 2024-11-25 17:24:00         0     1
34: 2024-11-25 17:25:00         0     1
35: 2024-11-25 17:26:00         0     0
              timestamp high_risk  stop

I imagine one could use some sort of rolling function like data.table::frollapply, but I didn't have the time to look into that.

本文标签： Using R to generate data based on conditions from multiple rows of another columnStack Overflow

版权声明：本文标题：Using R to generate data based on conditions from multiple rows of another column - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736307646a1933384.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Using R to generate data based on conditions from multiple rows of another column - Stack Overflow

1 Answer 1

更多相关文章