admin管理员组

文章数量:1291137

I'm running a Tweedie Regression, and for powers >= 2, I get an error telling me that my y values are out of the range of the HalfTweedieLoss. I understand the valid range of y for this loss to be >0. All my y-values are >0 and <1, yet I still get this error. I cannot figure out why.

sklearn version 1.3.0

I eliminated all rows with values of y <=0 and double checked with a describe. I was expecting the regressor to fit and to give me a better reason why it doesn't, especially as my y values are all greater than 0. I know that gamma is not a great distribution for my data, but I was hoping to try power=3 (inverse gaussian) and this is not possible either.

power = 0 and 1 both work fine (normal and poisson).

Here is the description of my training y data (cv_y):

count |   616420.000000  
mean  |        0.955883  
std   |        0.021402  
min   |        0.700465  
25%   |        0.937018  
50%   |        0.954769  
75%   |        0.975716  
max   |        0.990000  

Here are the important elements of my code

glr = TweedieRegressor()  # Generalized Linear Regression model
X_pipeline = Pipeline([("preprocessor",X_transformer),("model",glr)])
estimator = TransformedTargetRegressor(regressor=X_pipeline, transformer=y_transformer)
family = "Tweedie"
link = "auto"
n_splits=5
tscv = TimeSeriesSplit(gap=20, n_splits=n_splits)

param_grid = {
        'regressor__preprocessor__X_pca__whiten': [True,False],
        'regressor__model__power':[0,1,2],
        'regressor__model__alpha':[0.5],
        'regressor__model__fit_intercept': [True],
        'regressor__model__link': [link],
        'regressor__model__solver': ['newton-cholesky'],
        'regressor__model__max_iter': [5,10],
        'regressor__model__tol': [1e-5],
        'regressor__model__verbose':[1]
}

gs = GridSearchCV(
            estimator=estimator,
            param_grid=param_grid,
            scoring=scoring,
            n_jobs=-1,
            refit=refit_strategy,
            cv = tscv,
            verbose=3,
            pre_dispatch=10,
            error_score = 'raise'
)

model = gs.fit(cv_X,cv_y)



本文标签: