admin管理员组

文章数量:1317898

I want manually to set the number of predictors for lasso to 5, including the intercept term if any, so I can compare which predictors lasso selects to a different model.  With real data and about a hundred potential predictors, in glmnet I set dfmax = 4.  The number of non-zero coefficients in the solution is 7, not 4.  Is klunky iterative fudging of lambda my best option?

Example:

library(glmnet)
set.seed(42)
n.obs <- 100
n.predictors <- 10
y <- rnorm(n.obs)
x <- matrix(rnorm(n.obs * n.predictors), ncol = n.predictors)
    # Add some pattern to the data and give predictors different strengths
for (i in 1:n.predictors) {x[ , i] <- x[ , i] + y * i}
m.lambda <- cv.glmnet(x = x, y = y, alpha = 1)$lambda.min
m.lasso <- glmnet(x = x, y = y, alpha = 1, lambda = m.lambda, dfmax = 2)
coef(m.lasso)  
# All predictors are retained

Help for glmnet says dfmax is to “Limit the maximum number of variables in the model.  Useful for very large nvars, if a partial path is desired.”  Part of the problem may be that I don’t understand what a partial path is in this context and Googling hasn’t helped.  (For example, in Shortest Partial Path in a Graph, “partial path - is a path that doesn't have to visit every node”).

This question is the same as one of the questions in Behaviour of dfmax in glmnet, which looked perfect but has no answers.  is relevant to me but closed as off-topic for CrossValidated. Its only comment and only answer seem to contradict each other.  They do suggest the possibility that dfmax limits lasso to a specific set of predictors, not a specific count.

Follow-up: I ended up coding an iteration to find the multiplier for m.lambda that would yield the predictor count I wanted. A simple split-the-difference approach that updated floor and ceiling values for the multiplier converged readily for each of my half-dozen equations. I'll leave this question open in case someone does know and can share how to use dfmax, or failing that, for someone who wants a reminder that a homemade workaround may be easy.

本文标签: rLimiting predictor count in glmnetStack Overflow