admin管理员组文章数量:1332865
I want to plot the SHAP values for my RSF model; here is the code and error:
xvars <- c("RIDRETH1", "RXDLIPID", "DRXTKCAL", "DRXTPROT", "DRXTCARB", "DRXTCHOL", "DRXTFIBE", "DRXTVARA", "DRXTATOC", "DRXTSODI", "DRXTPOTA", "DRXTM161", "DRXTM181", "DRXTM201", "DRXTM221", "DRXTP182", "DRXTP183", "DRXTP184", "DRXTP204", "DRXTP205", "DRXTP225", "DRXTP226", "DRXTRET", "DRXT_G_TOTAL", "DRXT_V_STARCHY_TOTAL", "DRXTS160", "DRXTS180", "DRXTsumSFA", "INDFMPIR", "LBXCOT", "GENDERRC")
X <- Data[sample(nrow(Data), 1000), xvars]
bg_X <- Data[sample(nrow(Data), 200), ]
system.time(
ks <- kernelshap(rf_mort_nutrients_withoutage_1018_all, X, bg_X = bg_X, type = 'prob')
)
ks
ks <- shapviz(ks)
sv_importance(ks, kind = "bee", )
Error: Fejl i align_pred(pred_fun(object, bg_X, ...)) : Predictions must be numeric! Timing stopped at: 0.03 0.05 0.11
These are my predictions:
rf_mort_nutrients_withoutage_1018_all$predicted
[1] 81.31376 75.82491 99.35944 58.63055 67.65847 98.32906 75.33934 107.81604 62.22175 75.69875 69.99881 83.67161 81.39735 65.59381
I am not sure why it is not working. Anyone has an idea?
I want to plot the SHAP values for my RSF model; here is the code and error:
xvars <- c("RIDRETH1", "RXDLIPID", "DRXTKCAL", "DRXTPROT", "DRXTCARB", "DRXTCHOL", "DRXTFIBE", "DRXTVARA", "DRXTATOC", "DRXTSODI", "DRXTPOTA", "DRXTM161", "DRXTM181", "DRXTM201", "DRXTM221", "DRXTP182", "DRXTP183", "DRXTP184", "DRXTP204", "DRXTP205", "DRXTP225", "DRXTP226", "DRXTRET", "DRXT_G_TOTAL", "DRXT_V_STARCHY_TOTAL", "DRXTS160", "DRXTS180", "DRXTsumSFA", "INDFMPIR", "LBXCOT", "GENDERRC")
X <- Data[sample(nrow(Data), 1000), xvars]
bg_X <- Data[sample(nrow(Data), 200), ]
system.time(
ks <- kernelshap(rf_mort_nutrients_withoutage_1018_all, X, bg_X = bg_X, type = 'prob')
)
ks
ks <- shapviz(ks)
sv_importance(ks, kind = "bee", )
Error: Fejl i align_pred(pred_fun(object, bg_X, ...)) : Predictions must be numeric! Timing stopped at: 0.03 0.05 0.11
These are my predictions:
rf_mort_nutrients_withoutage_1018_all$predicted
[1] 81.31376 75.82491 99.35944 58.63055 67.65847 98.32906 75.33934 107.81604 62.22175 75.69875 69.99881 83.67161 81.39735 65.59381
I am not sure why it is not working. Anyone has an idea?
Share edited Nov 22, 2024 at 17:03 Ben Reiniger 12.7k3 gold badges22 silver badges40 bronze badges asked Nov 22, 2024 at 6:49 mtvprmtvpr 111 silver badge1 bronze badge 4 |1 Answer
Reset to default 1To analyze continuous rank probability scores, you can work like this:
library(randomForestSRC)
library(survival)
library(kernelshap)
library(shapviz)
head(veteran)
# trt celltype time status karno diagtime age prior
# 1 1 squamous 72 1 60 7 69 0
# 2 1 squamous 411 1 70 5 64 10
# 3 1 squamous 228 1 60 3 38 0
xvars <- setdiff(colnames(veteran), c("time", "status"))
fit <- rfsrc(
reformulate(xvars, "Surv(time, status)"),
data = veteran,
ntree = 50,
nodesize = 20,
importance = TRUE
)
# Function that returns continuous rank probability scores
pred_fun <- function(model, data) {
predict(model, data)$predicted
}
# Sample <=1000 rows from the training data. veteran is small enough to use all
X_explain <- veteran[xvars]
sv <- kernelshap(fit, X = X_explain, pred_fun = pred_fun) |>
shapviz()
sv |> sv_importance(kind = "bee")
sv |> sv_dependence(xvars)
本文标签: rSHAP values for random survival forestStack Overflow
版权声明:本文标题:r - SHAP values for random survival forest - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742293293a2448219.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
predict(rf_mort_nutrients_withoutage_1018_all, X, type = 'prob')
? You should get numeric output. Without reproducible example, we won't be able to help. – Michael M Commented Nov 22, 2024 at 18:21Sample size of test (predict) data: 1000 Number of grow trees: 200 Average no. of grow terminal nodes: 35.705 Total no. of grow variables: 31 Resampling used to grow trees: swor Resample size used to grow trees: 37243 Analysis: RSF Family: surv
– mtvpr Commented Nov 25, 2024 at 9:16rf_mort_nutrients <- rfsrc(Surv(endage, mortality_status) ~ . , data = data, ntree = 200, nodesize = 1000, importance = T)
and I would like to plot the SHAP values to understand which variables contribute most to mortality and in which direction, ie, how feature value alters the prediction. Is there a method compatible with random survival forests to have this? – mtvpr Commented Nov 27, 2024 at 9:36