admin管理员组

文章数量:1122853

I'm quite new to Bayesian statistics and not sure how to determine the dimensions of my priors. I am using code from Ben Bolker to run quite a big MCMCglmm with 52 response variables, 7 fixed effects, and 1 random effect. The response variables are arthropod taxa abundances, which are all counts with many, many zeros.

For MRE purposes, I scaled down the test to just the first 3 response variables and first 3 fixed effects. It looks like this:

datum <- read.csv("2024_data_raw_MCMCglmm.csv")

datum$Site <- factor(datum$Site) #my random effect variable

abund_vars <- c("Lycosidae","Gnaphosidae","Pholcidae") #my response variables
abund_vars_sc <- paste(abund_vars,"s",sep="_") #scaling variables
datum2 <- datum

fmla.MMLM1 <- cbind(Lycosidae, Gnaphosidae, Pholcidae) ~
  trait:(ShrubVolume + X.Leafy + P_cover_shrubs) - 1
#traits are my fixed effects

fam.test <- rep("zipoisson", 3) #same family for each variate

prior.model.1 <- list(R = list(V=diag(3)/3, nu=0.003),
                      G = list(G1=list(V=diag(3)/3, nu = 0.003)))
##what is wrong with my prior model?? the following line does not like it!!##

t2 <- system.time(
  MMLM1.fit <- MCMCglmm(fmla.MMLM1,
                        random=~ us(trait):Site,
                        rcov=~ us(trait):units,
                        prior = prior.model.1,
                        data = datum2,
                        family = fam.test,
                        nitt= 10000, burnin= 2000, thin=10)
)

Running the last line of code should do 10000 Markov chain Monte Carlo iterations that I can then summary() to get effect sizes and p-values (among other things I can't easily interpret), but I keep getting this error message:

Error in priorformat(if (NOpriorG) { : 
  V is the wrong dimension for some prior$G/prior$R elements
Timing stopped at: 0.21 0 0.22

From my understanding, V is supposed to be diag(3) because I have 3 response variables, in this case. What am I missing?

What confuses me even more is that the code worked just fine when my family was just "poisson" with these same priors. Does zero-inflated change what the priors need to be?

I'm quite new to Bayesian statistics and not sure how to determine the dimensions of my priors. I am using code from Ben Bolker to run quite a big MCMCglmm with 52 response variables, 7 fixed effects, and 1 random effect. The response variables are arthropod taxa abundances, which are all counts with many, many zeros.

For MRE purposes, I scaled down the test to just the first 3 response variables and first 3 fixed effects. It looks like this:

datum <- read.csv("2024_data_raw_MCMCglmm.csv")

datum$Site <- factor(datum$Site) #my random effect variable

abund_vars <- c("Lycosidae","Gnaphosidae","Pholcidae") #my response variables
abund_vars_sc <- paste(abund_vars,"s",sep="_") #scaling variables
datum2 <- datum

fmla.MMLM1 <- cbind(Lycosidae, Gnaphosidae, Pholcidae) ~
  trait:(ShrubVolume + X.Leafy + P_cover_shrubs) - 1
#traits are my fixed effects

fam.test <- rep("zipoisson", 3) #same family for each variate

prior.model.1 <- list(R = list(V=diag(3)/3, nu=0.003),
                      G = list(G1=list(V=diag(3)/3, nu = 0.003)))
##what is wrong with my prior model?? the following line does not like it!!##

t2 <- system.time(
  MMLM1.fit <- MCMCglmm(fmla.MMLM1,
                        random=~ us(trait):Site,
                        rcov=~ us(trait):units,
                        prior = prior.model.1,
                        data = datum2,
                        family = fam.test,
                        nitt= 10000, burnin= 2000, thin=10)
)

Running the last line of code should do 10000 Markov chain Monte Carlo iterations that I can then summary() to get effect sizes and p-values (among other things I can't easily interpret), but I keep getting this error message:

Error in priorformat(if (NOpriorG) { : 
  V is the wrong dimension for some prior$G/prior$R elements
Timing stopped at: 0.21 0 0.22

From my understanding, V is supposed to be diag(3) because I have 3 response variables, in this case. What am I missing?

What confuses me even more is that the code worked just fine when my family was just "poisson" with these same priors. Does zero-inflated change what the priors need to be?

Share Improve this question edited Jan 6 at 22:57 Joseph Coolidge asked Jan 6 at 21:57 Joseph CoolidgeJoseph Coolidge 12 bronze badges New contributor Joseph Coolidge is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 4
  • Please edit your question to provide a minimal reproducible example of your code and data along with relevant errors so that readers can run your code to answer your question. – M-- Commented Jan 6 at 22:05
  • Yes, the zero-inflated model needs different priors, I think the VCVs need to have twice the dimension for the poisson and zero-inflated parts of the model. The first part is for the Poisson, the second part for the zero-inflation. At least that is my impression from the zero-inflation section of the course notes, which I find very hard to understand. – Axeman Commented Jan 6 at 22:48
  • @M-- is this sufficient? – Joseph Coolidge Commented Jan 6 at 23:13
  • @JosephCoolidge we don't have that csv file, and uploading it externally is not going to work either. You need to create a small toy dataset that reproduces your issue. This seems like an involved issue, ao you probably need to read the link I shared above in details to improve your question further. I understand that this is not convenient, but my hope is that the effort will translate to you getting proper help. Cheers. – M-- Commented Jan 6 at 23:38
Add a comment  | 

1 Answer 1

Reset to default 1

tl;dr it looks like the covariance matrices need to be 6x6 instead of 3x3 (i.e., the model has 2 parameters per column of the response matrix rather than 1 ...). Below I'll show how I figured this out.

run with Poisson, original priors

(Code to set up example data is at the end of this answer.) To my surprise (since the original data set is tiny) this ran fine (although of course I didn't look at MCMC diagnostics etc.):

fam1 <- rep("poisson", 3)
prior.model.1 <- list(R = list(V=diag(3)/3, nu=0.003),
                      G = list(G1=list(V=diag(3)/3, nu = 0.003)))
fit1 <- MCMCglmm(form,
                 random=~ us(trait):Site,
                 rcov=~ us(trait):units,
                 prior = prior.model.1,
                 data = datum,
                 family = fam1)

now try with zipoisson

This replicates the error described above.

fam2 <- rep("zipoisson", 3)
fit2 <- MCMCglmm(form,
                 random=~ us(trait):Site,
                 rcov=~ us(trait):units,
                 prior = prior.model.1,
                 data = datum,
                 family = fam2)

what now?

When this happens I use a combination of traceback() (see the call stack when the problem happened), options(error=recover) (drop into browser when R encounters an error), and debug(MCMCglmm:::priorformat) to step through the code until we are right before the location of the error.

Then we run some bits of code in the browser. This is the test that fails:

any(dim(prior$V) != sum(nfl)) ## TRUE

Here dim(prior$V) is (3,3) and sum(nfl) is 6. This tells us that we need 6x6 rather than 3x3 prior covariance matrices.

This works:

prior.model.2 <- list(R = list(V=diag(6)/6, nu=0.003),
                      G = list(G1=list(V=diag(6)/6, nu = 0.003)))
fit3 <- MCMCglmm(form,
                 random=~ us(trait):Site,
                 rcov=~ us(trait):units,
                 prior = prior.model.2,
                 data = datum,
                 family = fam2)

PS you don't necessarily need a zero-inflated model for data with lots of zeros — such data may also be consistent with a Poisson model that has a low mean ...

Warton, David I. 2005. “Many Zeros Does Not Mean Zero Inflation: Comparing the Goodness-of-Fit of Parametric Models to Multivariate Abundance Data.” Environmetrics 16 (3): 275–89. https://doi.org/10.1002/env.702.


set up example

set.seed(101)

ab_vars <- (replicate(3, rpois(20, lambda = 0.2))
    |> data.frame()
    |> setNames(c("L","P","G"))
)
env_vars <-  (replicate(3, rnorm(20))
    |> data.frame()
    |> setNames(c("env1","env2","env3"))
)
datum <- data.frame(Site = factor(1:20), ab_vars, env_vars)
form <- cbind(L, P, G) ~ trait:(env1 + env2 + env3) - 1

本文标签: rPrior distribution for zeroinflated Poisson MCMCglmmStack Overflow