admin管理员组

文章数量:1392066

I have a list of times and values that are not standardized; in other words, the times when values are assigned are not consistent across the data. Some elements may have values assigned at t=1,3,5 while others have values assigned at t=2,4,6,8. How can I convert these values to a standard mxn matrix format? I would be okay with filling in the blanks with NA but ideally I would like impute the missing values.

Example:

set.seed(1)
ids <- 1:5

# create a time-based function that generates a new value based on previous value
age_fn <- function(prior_value, age) prior_value - 1.2*age

my_list <- list()
for(i in 1:length(ids))
{
  # how many records for this id?
  N <- sample(c(2:4), 1, replace=TRUE)
  
  # for each record, assign a time-step
  time_step <- sample(c(1:3), N-1, replace=TRUE) # minus 1 because first record is at t=0
  # define time when values are recorded
  t <- rep(0, times=N)
  for(n in 2:N)
  {
    t[n] <- t[n-1] + time_step[n-1]
  }
  
  # assign values recorded at each time
  value <- rep(100, times=N)
  for(n in 2:N)
  {
    value[n] <- age_fn(value[n-1], t[n])
  }
  
  my_list[[i]] <- list(ids[i], t, value)
}

As an example, my first element has a two values while the second one has four, ranging from t=0 to t=7:

> my_list

[[3]]
[[3]][[1]]
[1] 3

[[3]][[2]]
[1] 0 3

[[3]][[3]]
[1] 100.0  96.4


[[4]]
[[4]][[1]]
[1] 4

[[4]][[2]]
[1] 0 2 4 7

[[4]][[3]]
[1] 100.0  97.6  92.8  84.4

I would like this as a 2x8 matrix [element, t], with values (or NA) filled in where they don't exist in the non-standard format:

          0   1     2    3    4    5    6    7
values1 100 100 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100  97.6 97.6 92.8 92.8 92.8 84.4

Edit: edited with set.seed for reproducability and example output

I have a list of times and values that are not standardized; in other words, the times when values are assigned are not consistent across the data. Some elements may have values assigned at t=1,3,5 while others have values assigned at t=2,4,6,8. How can I convert these values to a standard mxn matrix format? I would be okay with filling in the blanks with NA but ideally I would like impute the missing values.

Example:

set.seed(1)
ids <- 1:5

# create a time-based function that generates a new value based on previous value
age_fn <- function(prior_value, age) prior_value - 1.2*age

my_list <- list()
for(i in 1:length(ids))
{
  # how many records for this id?
  N <- sample(c(2:4), 1, replace=TRUE)
  
  # for each record, assign a time-step
  time_step <- sample(c(1:3), N-1, replace=TRUE) # minus 1 because first record is at t=0
  # define time when values are recorded
  t <- rep(0, times=N)
  for(n in 2:N)
  {
    t[n] <- t[n-1] + time_step[n-1]
  }
  
  # assign values recorded at each time
  value <- rep(100, times=N)
  for(n in 2:N)
  {
    value[n] <- age_fn(value[n-1], t[n])
  }
  
  my_list[[i]] <- list(ids[i], t, value)
}

As an example, my first element has a two values while the second one has four, ranging from t=0 to t=7:

> my_list

[[3]]
[[3]][[1]]
[1] 3

[[3]][[2]]
[1] 0 3

[[3]][[3]]
[1] 100.0  96.4


[[4]]
[[4]][[1]]
[1] 4

[[4]][[2]]
[1] 0 2 4 7

[[4]][[3]]
[1] 100.0  97.6  92.8  84.4

I would like this as a 2x8 matrix [element, t], with values (or NA) filled in where they don't exist in the non-standard format:

          0   1     2    3    4    5    6    7
values1 100 100 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100  97.6 97.6 92.8 92.8 92.8 84.4

Edit: edited with set.seed for reproducability and example output

Share Improve this question edited Mar 11 at 17:50 Friede 9,1752 gold badges9 silver badges29 bronze badges asked Mar 11 at 16:30 coolhandcoolhand 2,1096 gold badges27 silver badges48 bronze badges 6
  • The list generation looks odd and suggest a xyproblem.info. Where is the data from? – Friede Commented Mar 11 at 16:38
  • @Friede The data above is completely made up. The real data is from inspection records. The main point is there are non-standard inspection information, where the time duration and number differ between elements – coolhand Commented Mar 11 at 16:40
  • my_list has five ids, 1 to 5, the desired matrix suggests two (values1 and values2)? – Friede Commented Mar 11 at 16:43
  • @Friede because I'm only showing the first two elements with the assumption that the other elements would follow similar – coolhand Commented Mar 11 at 16:46
  • 1 @coolhand Since you use sample, can you please use set.seed(1) ? – Tim G Commented Mar 11 at 16:56
 |  Show 1 more comment

2 Answers 2

Reset to default 2

You could easily do

m = max(unlist((lapply(my_list, `[[`, 2)))) 
v = rep(NA, m+1)
sapply(my_list, \(l) {
  v[l[[2]] + 1] = l[[3]]
  zoo::na.locf(v) }) |> 
  t() |>
  `dimnames<-`(list(paste0('values', sapply(my_list, '[[', 1)), 0:m))
          0     1     2    3    4    5    6    7
values1 100 100.0 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100.0  97.6 97.6 97.6 97.6 97.6 97.6
values3 100 100.0 100.0 96.4 96.4 96.4 96.4 96.4
values4 100 100.0  97.6 97.6 92.8 92.8 92.8 84.4
values5 100  98.8  96.4 92.8 92.8 92.8 92.8 92.8

Without zoo::na.locf()

          0    1    2    3    4  5  6    7
values1 100   NA   NA 96.4   NA NA NA   NA
values2 100   NA 97.6   NA   NA NA NA   NA
values3 100   NA   NA 96.4   NA NA NA   NA
values4 100   NA 97.6   NA 92.8 NA NA 84.4
values5 100 98.8 96.4 92.8   NA NA NA   NA

Do lapply( ... ) |> do.call(what='rbind') instead, if you prefer lapply() (I do).

Assumptions:

  • there are no duplicated ts for each id,
  • and each list element correspondens to an unique id.

We can optimse if the list is very long.

What you have there is pretty close to a sparse Matrix.

L <- lapply(my_list, \(L) {
  L[[1]] <- rep(L[[1]], length(L[[2]]))
  L[[2]] <- L[[2]] + 1L
  L
})

i <- do.call(c, lapply(L, "[[", 1))
j <- do.call(c, lapply(L, "[[", 2))
x <- do.call(c, lapply(L, "[[", 3))

library(Matrix)
M <- sparseMatrix(i, j, x = x)
M
#5 x 8 sparse Matrix of class "dgCMatrix"
#                                     
#[1,] 100  .    .   96.4  .   . .  .  
#[2,] 100  .   97.6  .    .   . .  .  
#[3,] 100  .    .   96.4  .   . .  .  
#[4,] 100  .   97.6  .   92.8 . . 84.4
#[5,] 100 98.8 96.4 92.8  .   . .  . 

I don't know what the subsequent steps in your analysis are but sparse matrices can offer significants benefits regarding memory demand and performance. Of course, you can always do as.matrix(M) to get a dense matrix.

If you want to impute values, you probably should use a more sophisticated approach than what you propose in your question. E.g., the example data looks a bit like you could fit a non-linear mixed-effects model based on the exponential decay function and predict missing values from that model.

本文标签: rConvert nonstandard list to standard matrix formatStack Overflow