admin管理员组文章数量:1122832
I am new to R. The vector I am analyzing is a list of items (a,a,b,a,b,b,b,a etc.). I need to count probability of a (previous element) becoming b (next element), a becoming a, b becoming a and b becoming b. 4 results need to create a matrix. Can you please advise how to do it? I read about str_detect function, duplicate_count, str_count function, but I do not know how may I apply it for my task.
Thank you in advance for help and reply!
I am new to R. The vector I am analyzing is a list of items (a,a,b,a,b,b,b,a etc.). I need to count probability of a (previous element) becoming b (next element), a becoming a, b becoming a and b becoming b. 4 results need to create a matrix. Can you please advise how to do it? I read about str_detect function, duplicate_count, str_count function, but I do not know how may I apply it for my task.
Thank you in advance for help and reply!
Share Improve this question edited Nov 23, 2024 at 5:24 jpsmith 16.7k6 gold badges20 silver badges43 bronze badges asked Nov 22, 2024 at 21:06 Mike_RMike_R 12 bronze badges1 Answer
Reset to default 3I'll use the following vector as an example:
(seed <- sample(.Machine$integer.max, 1))
#> [1] 2041751758
set.seed(seed) # for reproducibility
(x <- sample(c("a", "b"), 10, TRUE))
#> [1] "b" "b" "a" "a" "a" "b" "b" "b" "a" "a"
The easiest way is to use table
:
table(data.frame(from = x[-length(x)], to = x[-1]))
#> to
#> from a b
#> a 3 1
#> b 2 3
To do the same thing manually, use matrix
, tabulate
, and Boolean arithmetic. First, encode the four possibilities as integers. For a vectorized solution, x[-length(x)]
will be the "from" value in each transition, and x[-1]
will be the "to" value. If the "from" value is "a"
, add 0. If it is "b"
add 1
. If the "to" value is "a"
add 0. If it is "b"
, add 2. Add 1 to each result to get the values between 1 and 4.
tabulate
counts the number of each integer value in a vector. There are four possibilities, so set tabulate
's nbins
argument to 4L
.
Finally, put the results of tabulate
in a matrix with 2 rows and 2 columns and set the names as desired. I set mine so that "a->" means a transition started with "a"
, and "->a" means the transition ended with "a"
.
matrix(tabulate((x[-length(x)] == "b") + 2L*(x[-1] == "b") + 1L, 4L),
2, 2, 0, list(c("a->", "b->"), c("->a", "->b")))
#> ->a ->b
#> a-> 3 1
#> b-> 2 3
Compare the performance of the two approaches with a larger vector:
x <- sample(c("a", "b"), 1e6, TRUE)
trans1 <- function(x) table(data.frame(from = x[-length(x)], to = x[-1]))
trans2 <- function(x) {
x <- x == "b"
matrix(tabulate(x[-length(x)] + 2L*x[-1] + 1L, 4L),
2, 2, 0, list(c("a->", "b->"), c("->a", "->b")))
}
microbenchmark::microbenchmark(
table = trans1(x),
tabulate = trans2(x)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> table 84.2272 88.4737 102.27207 91.48075 123.30125 133.5508 100 a
#> tabulate 25.1709 26.7004 32.58225 27.66895 30.02525 68.6166 100 b
本文标签: rRstudio binary listcheck probability of previous value changed (or remained the same)Stack Overflow
版权声明:本文标题:r - Rstudio: binary list - check probability of previous value changed (or remained the same) - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736300856a1930971.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论