admin管理员组文章数量:1392066
I have a list of times and values that are not standardized; in other words, the times when values are assigned are not consistent across the data. Some elements may have values assigned at t=1,3,5
while others have values assigned at t=2,4,6,8
. How can I convert these values to a standard mxn
matrix format? I would be okay with filling in the blanks with NA
but ideally I would like impute the missing values.
Example:
set.seed(1)
ids <- 1:5
# create a time-based function that generates a new value based on previous value
age_fn <- function(prior_value, age) prior_value - 1.2*age
my_list <- list()
for(i in 1:length(ids))
{
# how many records for this id?
N <- sample(c(2:4), 1, replace=TRUE)
# for each record, assign a time-step
time_step <- sample(c(1:3), N-1, replace=TRUE) # minus 1 because first record is at t=0
# define time when values are recorded
t <- rep(0, times=N)
for(n in 2:N)
{
t[n] <- t[n-1] + time_step[n-1]
}
# assign values recorded at each time
value <- rep(100, times=N)
for(n in 2:N)
{
value[n] <- age_fn(value[n-1], t[n])
}
my_list[[i]] <- list(ids[i], t, value)
}
As an example, my first element has a two values while the second one has four, ranging from t=0 to t=7
:
> my_list
[[3]]
[[3]][[1]]
[1] 3
[[3]][[2]]
[1] 0 3
[[3]][[3]]
[1] 100.0 96.4
[[4]]
[[4]][[1]]
[1] 4
[[4]][[2]]
[1] 0 2 4 7
[[4]][[3]]
[1] 100.0 97.6 92.8 84.4
I would like this as a 2x8 matrix [element, t]
, with values (or NA
) filled in where they don't exist in the non-standard format:
0 1 2 3 4 5 6 7
values1 100 100 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100 97.6 97.6 92.8 92.8 92.8 84.4
Edit: edited with set.seed
for reproducability and example output
I have a list of times and values that are not standardized; in other words, the times when values are assigned are not consistent across the data. Some elements may have values assigned at t=1,3,5
while others have values assigned at t=2,4,6,8
. How can I convert these values to a standard mxn
matrix format? I would be okay with filling in the blanks with NA
but ideally I would like impute the missing values.
Example:
set.seed(1)
ids <- 1:5
# create a time-based function that generates a new value based on previous value
age_fn <- function(prior_value, age) prior_value - 1.2*age
my_list <- list()
for(i in 1:length(ids))
{
# how many records for this id?
N <- sample(c(2:4), 1, replace=TRUE)
# for each record, assign a time-step
time_step <- sample(c(1:3), N-1, replace=TRUE) # minus 1 because first record is at t=0
# define time when values are recorded
t <- rep(0, times=N)
for(n in 2:N)
{
t[n] <- t[n-1] + time_step[n-1]
}
# assign values recorded at each time
value <- rep(100, times=N)
for(n in 2:N)
{
value[n] <- age_fn(value[n-1], t[n])
}
my_list[[i]] <- list(ids[i], t, value)
}
As an example, my first element has a two values while the second one has four, ranging from t=0 to t=7
:
> my_list
[[3]]
[[3]][[1]]
[1] 3
[[3]][[2]]
[1] 0 3
[[3]][[3]]
[1] 100.0 96.4
[[4]]
[[4]][[1]]
[1] 4
[[4]][[2]]
[1] 0 2 4 7
[[4]][[3]]
[1] 100.0 97.6 92.8 84.4
I would like this as a 2x8 matrix [element, t]
, with values (or NA
) filled in where they don't exist in the non-standard format:
0 1 2 3 4 5 6 7
values1 100 100 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100 97.6 97.6 92.8 92.8 92.8 84.4
Edit: edited with set.seed
for reproducability and example output
2 Answers
Reset to default 2You could easily do
m = max(unlist((lapply(my_list, `[[`, 2))))
v = rep(NA, m+1)
sapply(my_list, \(l) {
v[l[[2]] + 1] = l[[3]]
zoo::na.locf(v) }) |>
t() |>
`dimnames<-`(list(paste0('values', sapply(my_list, '[[', 1)), 0:m))
0 1 2 3 4 5 6 7
values1 100 100.0 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100.0 97.6 97.6 97.6 97.6 97.6 97.6
values3 100 100.0 100.0 96.4 96.4 96.4 96.4 96.4
values4 100 100.0 97.6 97.6 92.8 92.8 92.8 84.4
values5 100 98.8 96.4 92.8 92.8 92.8 92.8 92.8
Without zoo::na.locf()
0 1 2 3 4 5 6 7
values1 100 NA NA 96.4 NA NA NA NA
values2 100 NA 97.6 NA NA NA NA NA
values3 100 NA NA 96.4 NA NA NA NA
values4 100 NA 97.6 NA 92.8 NA NA 84.4
values5 100 98.8 96.4 92.8 NA NA NA NA
Do lapply( ... ) |> do.call(what='rbind')
instead, if you prefer lapply()
(I do).
Assumptions:
- there are no duplicated
t
s for eachid
, - and each list element correspondens to an unique
id
.
We can optimse if the list is very long.
What you have there is pretty close to a sparse Matrix.
L <- lapply(my_list, \(L) {
L[[1]] <- rep(L[[1]], length(L[[2]]))
L[[2]] <- L[[2]] + 1L
L
})
i <- do.call(c, lapply(L, "[[", 1))
j <- do.call(c, lapply(L, "[[", 2))
x <- do.call(c, lapply(L, "[[", 3))
library(Matrix)
M <- sparseMatrix(i, j, x = x)
M
#5 x 8 sparse Matrix of class "dgCMatrix"
#
#[1,] 100 . . 96.4 . . . .
#[2,] 100 . 97.6 . . . . .
#[3,] 100 . . 96.4 . . . .
#[4,] 100 . 97.6 . 92.8 . . 84.4
#[5,] 100 98.8 96.4 92.8 . . . .
I don't know what the subsequent steps in your analysis are but sparse matrices can offer significants benefits regarding memory demand and performance. Of course, you can always do as.matrix(M)
to get a dense matrix.
If you want to impute values, you probably should use a more sophisticated approach than what you propose in your question. E.g., the example data looks a bit like you could fit a non-linear mixed-effects model based on the exponential decay function and predict missing values from that model.
本文标签: rConvert nonstandard list to standard matrix formatStack Overflow
版权声明:本文标题:r - Convert non-standard list to standard matrix format - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744781762a2624756.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
my_list
has fiveids
, 1 to 5, the desired matrix suggests two (values1 and values2)? – Friede Commented Mar 11 at 16:43set.seed(1)
? – Tim G Commented Mar 11 at 16:56