machine learning - lightgbm.cv: cvbooster.best_iteration always returns -1 - Stack Overflow

IT技术

更新时间：2025-01-115

admin管理员组
文章数量:1123796

I am migrating from XGBoost to LightGBM (since I need it's exact handling of interaction constraints) and I am struggling to understand the result of LightGBM CV. In the example below, the minimum log-loss is achieved on iteration 125, but model['cvbooster'].best_iteration returns -1. I would have expected it to return 125 as well - or am I misunderstanding something here? Is there a better way to get the best iteration, or does one just need to manually check?

I have seen this discussion but even when I check the boosters in cvbooster (e.g., model['cvbooster'].boosters[0].best_iteration), they all return -1 as well...

import lightgbm as lgb
import numpy as np
from sklearn import datasets

X, y = datasets.make_classification(n_samples=10_000, n_features=5, n_informative=3, random_state=9)

data_train_lgb = lgb.Dataset(X, label=y)

param = {'objective':   'binary',
         'metric':      ['binary_logloss'],
         'device_type': 'cuda'}

model = lgb.cv(param,
               data_train_lgb,
               num_boost_round=1_000,
               return_cvbooster=True)

opt_1 = np.argmin(model['valid binary_logloss-mean'])
print(f"index argmin: {opt_1}")
print(f"logloss argmin: {model['valid binary_logloss-mean'][opt_1]}")

opt_2 = model['cvbooster'].best_iteration
print(f"index best_iteration: {opt_2}")
print(f"logloss best_iteration: {model['valid binary_logloss-mean'][opt_2]}")

---

>>> index argmin: 125
>>> logloss argmin: 0.13245999867688793

>>> index best_iteration: -1
>>> logloss best_iteration: 0.2661896445658779

I am migrating from XGBoost to LightGBM (since I need it's exact handling of interaction constraints) and I am struggling to understand the result of LightGBM CV. In the example below, the minimum log-loss is achieved on iteration 125, but model['cvbooster'].best_iteration returns -1. I would have expected it to return 125 as well - or am I misunderstanding something here? Is there a better way to get the best iteration, or does one just need to manually check?

I have seen this discussion but even when I check the boosters in cvbooster (e.g., model['cvbooster'].boosters[0].best_iteration), they all return -1 as well...

import lightgbm as lgb
import numpy as np
from sklearn import datasets

X, y = datasets.make_classification(n_samples=10_000, n_features=5, n_informative=3, random_state=9)

data_train_lgb = lgb.Dataset(X, label=y)

param = {'objective':   'binary',
         'metric':      ['binary_logloss'],
         'device_type': 'cuda'}

model = lgb.cv(param,
               data_train_lgb,
               num_boost_round=1_000,
               return_cvbooster=True)

opt_1 = np.argmin(model['valid binary_logloss-mean'])
print(f"index argmin: {opt_1}")
print(f"logloss argmin: {model['valid binary_logloss-mean'][opt_1]}")

opt_2 = model['cvbooster'].best_iteration
print(f"index best_iteration: {opt_2}")
print(f"logloss best_iteration: {model['valid binary_logloss-mean'][opt_2]}")

---

>>> index argmin: 125
>>> logloss argmin: 0.13245999867688793

>>> index best_iteration: -1
>>> logloss best_iteration: 0.2661896445658779

Share Improve this question asked yesterday usdn 1696 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

In lightgbm (the Python package for LightGBM), best_iteration isn't the iteration where the model achieved the best performance on evaluation metrics... it's the last iteration (1-based) where performance on evaluation metrics improved, if early stopping is used.

See this example (using lightgbm==4.5.0, scikit-learn==4.5.0, and Python 3.11).

import lightgbm as lgb
import numpy as np
from sklearn import datasets

X, y = datasets.make_classification(
    n_samples=10_000,
    n_features=5,
    n_informative=3,
    random_state=9
)

params = {
    "deterministic": True,
    "objective": "binary",
    "metric": "binary_logloss",
    "seed": 708
}

# train without early stopping
model = lgb.cv(
    params=params,
    train_set=lgb.Dataset(X, label=y),
    num_boost_round=200,
    return_cvbooster=True
)

model['cvbooster'].best_iteration
# -1

opt_1 = np.argmin(model['valid binary_logloss-mean'])
print(f"index argmin: {opt_1}")
# index argmin: 114
print(f"logloss argmin: {model['valid binary_logloss-mean'][opt_1]:.6f}")
logloss argmin: 0.132579

# train WITH early stopping
model = lgb.cv(
    params={**params, "early_stopping_rounds": 5},
    train_set=lgb.Dataset(X, label=y),
    num_boost_round=200,
    return_cvbooster=True
)

model['cvbooster'].best_iteration
# 115

opt_1 = np.argmin(model['valid binary_logloss-mean'])
print(f"index argmin: {opt_1}")
# index argmin: 114
print(f"logloss argmin: {model['valid binary_logloss-mean'][opt_1]:.6f}")
# logloss argmin: 0.132579

Notes on that:

adding "deterministic": True and setting "seed" to a positive value helps make training deterministic
early stopping in cv() can be enabled by passing a positive value for "early_stopping_rounds" through params

本文标签： machine learninglightgbmcv cvboosterbestiteration always returns 1Stack Overflow

版权声明：本文标题：machine learning - lightgbm.cv: cvbooster.best_iteration always returns -1 - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736594936a1945133.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

machine learning - lightgbm.cv: cvbooster.best_iteration always returns -1 - Stack Overflow

1 Answer 1

更多相关文章