admin管理员组

文章数量:1123931

I would like to run iterations of a single model, substituting one of a set of 34 different response variables in each iteration, and organize the results (from summary()) of all of those models into a table to be exported with write.csv(), with the results arranged row-wise, naming each row with the character name (not the column index) of each response variable.

I think that "all" I need to do is use apply() to achieve this, but how can I implement it?

Example model

mod1 <- gam(response.1+1 ~ s(predictor), bs = "cs", family = Gamma(link = "log"), method = "REML", data = ex.dat)

summary(mod1)

Example data: `

ex.dat <- 
structure(list(id = 1:20, site = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), predictor = c(3660.0384, 
3660.0384, 3660.0384, 3660.0384, 3660.0384, 3660.0384, 3660.0384, 
3499.7136, 3499.7136, 3499.7136, 3499.7136, 3477.1584, 3477.1584, 
3477.1584, 3477.1584, 3477.1584, 3477.1584, 3477.1584, 3477.1584, 
3477.1584), response.1 = c(65012L, 688717L, 0L, 0L, 338092L, 
216134L, 0L, 1719602L, 692439L, 0L, 0L, 432584L, 131228L, 113546L, 
0L, 928748L, 111161L, 393458L, 272371L, 2650333L), response.2 = c(114937L, 
0L, 0L, 0L, 736729L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 291939L, 232576L, 
0L, 0L, 0L, 0L, 0L, 0L), response.3 = c(709637L, 2487151L, 197847L, 
346499L, 2018978L, 617447L, 2467347L, 0L, 0L, 839880L, 637807L, 
3345034L, 3315138L, 739640L, 3739090L, 9092537L, 5014285L, 4383957L, 
4723850L, 6943785L)), row.names = c(NA, 20L), class = "data.frame")

I would like to run iterations of a single model, substituting one of a set of 34 different response variables in each iteration, and organize the results (from summary()) of all of those models into a table to be exported with write.csv(), with the results arranged row-wise, naming each row with the character name (not the column index) of each response variable.

I think that "all" I need to do is use apply() to achieve this, but how can I implement it?

Example model

mod1 <- gam(response.1+1 ~ s(predictor), bs = "cs", family = Gamma(link = "log"), method = "REML", data = ex.dat)

summary(mod1)

Example data: `

ex.dat <- 
structure(list(id = 1:20, site = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), predictor = c(3660.0384, 
3660.0384, 3660.0384, 3660.0384, 3660.0384, 3660.0384, 3660.0384, 
3499.7136, 3499.7136, 3499.7136, 3499.7136, 3477.1584, 3477.1584, 
3477.1584, 3477.1584, 3477.1584, 3477.1584, 3477.1584, 3477.1584, 
3477.1584), response.1 = c(65012L, 688717L, 0L, 0L, 338092L, 
216134L, 0L, 1719602L, 692439L, 0L, 0L, 432584L, 131228L, 113546L, 
0L, 928748L, 111161L, 393458L, 272371L, 2650333L), response.2 = c(114937L, 
0L, 0L, 0L, 736729L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 291939L, 232576L, 
0L, 0L, 0L, 0L, 0L, 0L), response.3 = c(709637L, 2487151L, 197847L, 
346499L, 2018978L, 617447L, 2467347L, 0L, 0L, 839880L, 637807L, 
3345034L, 3315138L, 739640L, 3739090L, 9092537L, 5014285L, 4383957L, 
4723850L, 6943785L)), row.names = c(NA, 20L), class = "data.frame")
Share Improve this question edited yesterday JKO asked yesterday JKOJKO 3071 silver badge13 bronze badges 1
  • response.1 and predictor are in your model formula but not your example data. – Eonema Commented yesterday
Add a comment  | 

1 Answer 1

Reset to default 2

You could inject the symbol for each response variable into the formula using rlang::expr():

fit_gam <- function(response_var) {
  # inject (using !!) the response_var into the formula:
  formula <- rlang::eval_tidy(
    rlang::expr(!!response_var + 1 ~ s(predictor, k = 3))
  )
  # pass the formula to `gam()`
  mod1 <- mgcv::gam(formula, bs = "cs", family = Gamma(link = "log"), 
    method = "REML", data = ex.dat)
  summary(mod1)
}

df <- 
  # generate a list of symbols for your response variables
  lapply(paste0("response.", 1:3), as.symbol) |>
  # map this to `fit_gam()`
  lapply(fit_gam) |>
  # row-bind the summary objects to make a data frame
  do.call(rbind, args = _) |> as.data.frame()

Some notes:

  • Since summary.gam() dumps so much info, the resulting data frame is very wide and has some columns for which each cell is a table in itself. You won't be able to write this to a csv. You could write a wrapper to use in place of summary() that just gets the info you want, for example:
extract_fit_info <- function(fit) {
  with(summary(fit), c(
    "R^2" = r.sq,
    "Intercept" = p.coeff[["(Intercept)"]]
  ))
}

When we drop this in in place of summary(), we get:

df

#>           R^2 Intercept
#> 1  0.02320420 12.869125
#> 2 -0.04244942  9.045422
#> 3  0.50855255 14.399754
  • If your variable names don't all follow a pattern, you could input them using alist(). For example, instead of lapply(paste0("response.", 1:3), as.symbol), do:
alist(response.1, response.2, response.3)
  • You can also do this in base R using base::call(). Instead of formula <- rlang::eval_tidy( ... ), use:
  formula <- as.formula(
    call("~", call("+", response_var, 1), quote(s(predictor, k = 3)))
  )

I think it is more readable with rlang, though.

  • We use lapply() instead of apply() since lapply() works well for lists while apply() is mostly useful for matrices. purrr::map() is another good option here.

  • Note that I added k = 3 to the formula so it wouldn't throw an error.

本文标签: rRunning parallel models and compiling resultsStack Overflow