admin管理员组

文章数量:1288055

I have come around a strange thing when plotting a decision tree in sklearn.

I just wanted to compare a Random Forest model consisting of one estimator using bootstrapping and one without bootstrapping.

I am using a toy dataset with sample size n = 5.

My code looks the following:

tree_model = RandomForestRegressor(
    n_estimators=1,   
    max_features=2,
    max_depth=None,
    min_samples_split=2,
    min_samples_leaf=1,
    random_state=42,
    bootstrap=True # (or False)
)
tree_model.fit(X_train, y_train)
single_tree = tree_model.estimators_[0]
fig = plt.figure(figsize=(14, 14), facecolor="white", dpi=300)
ax = fig.add_subplot(111)
plot_tree(single_tree, feature_names=["X1", "X2"], filled=True, impurity=False, rounded=False, ax=ax)

Now when plotting the tree without Bootstrapping this looks good, the root node starts with 5 samples, but when plotting the tree when Bootstrapping is used, the root node starts with 4 samples. How is that even possible? I mean regardless of using bootstrapping or not, the training set will consist of 5 samples, so the root node should start with 5 samples.

本文标签: scikit learnPlotting one Decision Tree of a Random Forest in sklearnStack Overflow