admin管理员组

文章数量:1414861

I'm conducting a multinomial logistic regression model using proc logistic in SAS with around 3.6 million observations, an outcome with 5 levels, and dozens of categorical predictors. I had no issue running both univariate and multivariate models when setting param = ref.

However, once I tried param = glm, it started giving the warning message of "The information matrix is singular and thus the convergence is questionable. specifying a larger SINGULAR= value." in multivariate models. After doing some research, I found this message suggesting a multicollinearity issue in the model. I then tried to use only 2 predictors and it still gave the message while the correlation matrix showed no correlation between the two predictors.

As far as I know, the only difference of param = ref and param = glm is that param = glm uses less-than-full-rank reference coding, meaning that it will create k-1 dummy variables given k levels in the categorical predictor. These two parametrization methods should generate the same log-likelihood and estimates given the same reference level. To confirm this, I also compared the result of the two models using only 2 predictors. While param = glm throwing a warning, the result is identical to param = ref (Except a bunch of zeros in the estimates of reference levels for each predictor in param = glm, is it the cause?).

My question is, why did the param = glm model throwing a warning while param = ref did not. And more importantly, in this situation, should I trust the result of the param = ref even though no warning was displayed.

I appreciate any advice and suggestions. Thank you in advance.

本文标签: sasquotparamglmquot gave a singular matrix warning while quotparamrefquot did notStack Overflow