admin管理员组文章数量:1391977
According to the sklearn docs, if you apply predict_proba to DecisionTreeClassifier:
The predicted class probability is the fraction of samples of the same class in a leaf.
Let's say that the rows where class = 1 in my training dataset look like this:
feature_1 | feature_2 | class
----------|-----------|------
A | C | 1
A | C | 1
A | D | 1
B | C | 1
B | D | 1
I'm interpreting the docs to mean that if I trained a model on this data, predict_proba would tell me that a row where feature_1 = A and feature_2 = C would have a 40% chance of falling under class 1. This is because there are five rows total where class = 1, two of which also have feature_1 = A and feature_2 = C. Two is 40% of five.
Obviously this is a very simple example, but I'm just trying to understand the general methodology predict_proba uses.
Is my interpretation correct? I would have thought that in this case, the probability of class being 1 would be at least partially affected by any rows in the training dataset where feature_1 = A, feature_2 = C, and class != 1?
According to the sklearn docs, if you apply predict_proba to DecisionTreeClassifier:
The predicted class probability is the fraction of samples of the same class in a leaf.
Let's say that the rows where class = 1 in my training dataset look like this:
feature_1 | feature_2 | class
----------|-----------|------
A | C | 1
A | C | 1
A | D | 1
B | C | 1
B | D | 1
I'm interpreting the docs to mean that if I trained a model on this data, predict_proba would tell me that a row where feature_1 = A and feature_2 = C would have a 40% chance of falling under class 1. This is because there are five rows total where class = 1, two of which also have feature_1 = A and feature_2 = C. Two is 40% of five.
Obviously this is a very simple example, but I'm just trying to understand the general methodology predict_proba uses.
Is my interpretation correct? I would have thought that in this case, the probability of class being 1 would be at least partially affected by any rows in the training dataset where feature_1 = A, feature_2 = C, and class != 1?
Share Improve this question edited Mar 12 at 19:58 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 12 at 15:43 SRJCodingSRJCoding 4755 silver badges18 bronze badges1 Answer
Reset to default 1First of all, decision trees operate with numerical features - not categorical. And then the learning algorithm will attempt to find decision thresholds that optimize the given criteria. This happens in a greedy fashion until the stopping criteria (like max_depth) is hit. When the stopping criterion is hit, we have a leaf node. And at this leaf node (when we have applied all the featureA < threshold 1 && featureE > threshold2 etc), we might be left with multiple samples coming from different classes. This is when the cited rule comes in. The probabilities for each class, in that specific leaf node, will be set to the observed class proportions.
本文标签:
版权声明:本文标题:scikit learn - How is each tree within DecisionTreeClassifier calculating probability of a class? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744743487a2622749.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论