admin管理员组文章数量:1122846
After performing multiple imputation using R's mice
package, I have 50 datasets with time-to-event data for two treatments: drug A & drug B. I performed a Cox proportional hazards model coxph()
with inverse probability of treatment weights (IPtW) on each of the dataset, to retrieve the Hazards Ratio (HR) and for survival analysis (KM curve). For the adjusted KM curve I use the adjustedCurves
R package, and store the data ($adj
) of each of the 50 adjustedsurv()
objects in a list.
Edit: Note that I can only calculate my IPt weights after multiple imputation: one of the variables (score
) that contained NA
values is used to calculate the weights:
library(ggplot2)
library(MASS)
library(data.table)
library(adjustedCurves)
library(survival)
library(survminer)
# datasets is a list containing 50 imputed datasets
for (i in 1:length(datasets)){
# create IPtW denominator Pr[A|L] using binomial model (treat=0 or 1)
ipw.denom <- glm(treatment ~ period + score + age + sex,
family=binomial(link="logit"),
data=datasets[[i]])
# create IPtW numerator Pr[A] using binomial model (treat=0 or 1)
ipw.num <- glm(treatment ~ 1,
family=binomial(link="logit"),
data=datasets[[i]])
# predict for each subject
datasets[[i]]$ipw.num <- predict(ipw.num, datasets[[i]], type="response")
datasets[[i]]$ipw.denom <- predict(ipw.denom, datasets[[i]], type="response")
# create stabilized weights for each subject
#if treatment=1 -> ipw.num/ipw.denom
#if treatment=0 -> (1-ipw.num)/(1-ipw.denom)
datasets[[i]]$sw <- ifelse(datasets[[i]]$treatment==1,
datasets[[i]]$ipw.num/datasets[[i]]$ipw.denom,
(1-datasets[[i]]$ipw.num)/(1-datasets[[i]]$ipw.denom))
# For each dataset, apply a cox proportional hazards model
cox_list[[i]] <- coxph(
formula = Surv(time_to_event, outcome) ~ treatment + as.factor(period) + score + age + sex,
data = datasets[[i]],
weights = datasets[[i]]$sw, x=TRUE)
# Use the adjustedCurve package to create adjusted KM curves:
adjsurv <- adjustedsurv(data=datasets[[i]],
variable="treatment",
ev_time="time_to_event",
event="outcome",
method="direct",
outcome_model=cox_list[[i]],
conf_int=TRUE)
# Store the data, to apply Rubin's rule
cox_data[[i]] <- adjsurv$adj
Edit: I was notfied by @Denzo that this approach in adjustedsurv()
is not doubly robust, and should use method="aiptw"
with treatment_model=
instead.
/edit
Using Rubin's rule I managed to pool the data and capture both the within variance in each dataset as well as the between-each-imputed-dataset variance. And thereby managed to create a data table ready for a KM curve:
#Rubin's rule to calculate CIs for each time point:
# Convert SE to within variance
for (i in 1:length(datasets)){
cox_data[[i]]$w_var <- cox_data[[i]]$se^2
}
# Combine all data tables into one large data table and use dplyr for the Rubin's Rule steps:
cox_data_comb <- rbindlist(cox_data, idcol = "source_id")
# For the 50 datasets...
result <- cox_data_comb %>%
group_by(group, time) %>% # ...For each timepoint per treatment...
mutate(b_var = var(surv)) %>% # ...take the var of the mean surv prob to get the between 50 datasets var
mutate(t_var = w_var + (1 + 1/length(datasets)) * b_var) %>% # ...and calculate the total variance
mutate(pooled_se = sqrt(t_var)) %>% # Create pooled SE
mutate(lower_bound = surv - z_value * pooled_se) %>% # ...and pooled lower CI
mutate(upper_bound = surv + z_value * pooled_se) %>% # ...and pooled upper CI
ungroup()
And plot the results in a custom KM curve using ggplot2
and the results
table:
ggplot(result, aes(x = time, y = surv, color = as.factor(group), group = group)) +
geom_line(size = 1) + # Line for mean values
# ribbon for CIs:
geom_ribbon(aes(ymin = lower_bound, ymax = upper_bound, fill = as.factor(group)),
alpha = 0.2, color = NA) + # Shaded area for CI
scale_color_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) + # Custom colors
scale_fill_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) +
labs(title = "Probability of experiencing a first event",
x = "Time (days)",
y = "Probability",
color = "Treatment",
fill = "Treatment") +
theme_minimal() +
theme(legend.position = "top")
Lovely innit? But what about the Risk table that's usually plotted below the curve?
I know that the risk table is hidden in each adjustedsurv()
object, due to the possibility of plotting the risk_table in its follow-up function plot.adjustedsurv()
, but I can't find it nor do I know how to pool these 50 graphical risk tables into one graphical risk table.
I am also thinking of creating at_risk tables myself using data.table
by calculating probability at time t * size of each treatment arm (again the strata is on treatment). Could the gt
package be of help here? I cannot find anything in gt
regarding risk tables or longitudinal data in wide-format.
After performing multiple imputation using R's mice
package, I have 50 datasets with time-to-event data for two treatments: drug A & drug B. I performed a Cox proportional hazards model coxph()
with inverse probability of treatment weights (IPtW) on each of the dataset, to retrieve the Hazards Ratio (HR) and for survival analysis (KM curve). For the adjusted KM curve I use the adjustedCurves
R package, and store the data ($adj
) of each of the 50 adjustedsurv()
objects in a list.
Edit: Note that I can only calculate my IPt weights after multiple imputation: one of the variables (score
) that contained NA
values is used to calculate the weights:
library(ggplot2)
library(MASS)
library(data.table)
library(adjustedCurves)
library(survival)
library(survminer)
# datasets is a list containing 50 imputed datasets
for (i in 1:length(datasets)){
# create IPtW denominator Pr[A|L] using binomial model (treat=0 or 1)
ipw.denom <- glm(treatment ~ period + score + age + sex,
family=binomial(link="logit"),
data=datasets[[i]])
# create IPtW numerator Pr[A] using binomial model (treat=0 or 1)
ipw.num <- glm(treatment ~ 1,
family=binomial(link="logit"),
data=datasets[[i]])
# predict for each subject
datasets[[i]]$ipw.num <- predict(ipw.num, datasets[[i]], type="response")
datasets[[i]]$ipw.denom <- predict(ipw.denom, datasets[[i]], type="response")
# create stabilized weights for each subject
#if treatment=1 -> ipw.num/ipw.denom
#if treatment=0 -> (1-ipw.num)/(1-ipw.denom)
datasets[[i]]$sw <- ifelse(datasets[[i]]$treatment==1,
datasets[[i]]$ipw.num/datasets[[i]]$ipw.denom,
(1-datasets[[i]]$ipw.num)/(1-datasets[[i]]$ipw.denom))
# For each dataset, apply a cox proportional hazards model
cox_list[[i]] <- coxph(
formula = Surv(time_to_event, outcome) ~ treatment + as.factor(period) + score + age + sex,
data = datasets[[i]],
weights = datasets[[i]]$sw, x=TRUE)
# Use the adjustedCurve package to create adjusted KM curves:
adjsurv <- adjustedsurv(data=datasets[[i]],
variable="treatment",
ev_time="time_to_event",
event="outcome",
method="direct",
outcome_model=cox_list[[i]],
conf_int=TRUE)
# Store the data, to apply Rubin's rule
cox_data[[i]] <- adjsurv$adj
Edit: I was notfied by @Denzo that this approach in adjustedsurv()
is not doubly robust, and should use method="aiptw"
with treatment_model=
instead.
/edit
Using Rubin's rule I managed to pool the data and capture both the within variance in each dataset as well as the between-each-imputed-dataset variance. And thereby managed to create a data table ready for a KM curve:
#Rubin's rule to calculate CIs for each time point:
# Convert SE to within variance
for (i in 1:length(datasets)){
cox_data[[i]]$w_var <- cox_data[[i]]$se^2
}
# Combine all data tables into one large data table and use dplyr for the Rubin's Rule steps:
cox_data_comb <- rbindlist(cox_data, idcol = "source_id")
# For the 50 datasets...
result <- cox_data_comb %>%
group_by(group, time) %>% # ...For each timepoint per treatment...
mutate(b_var = var(surv)) %>% # ...take the var of the mean surv prob to get the between 50 datasets var
mutate(t_var = w_var + (1 + 1/length(datasets)) * b_var) %>% # ...and calculate the total variance
mutate(pooled_se = sqrt(t_var)) %>% # Create pooled SE
mutate(lower_bound = surv - z_value * pooled_se) %>% # ...and pooled lower CI
mutate(upper_bound = surv + z_value * pooled_se) %>% # ...and pooled upper CI
ungroup()
And plot the results in a custom KM curve using ggplot2
and the results
table:
ggplot(result, aes(x = time, y = surv, color = as.factor(group), group = group)) +
geom_line(size = 1) + # Line for mean values
# ribbon for CIs:
geom_ribbon(aes(ymin = lower_bound, ymax = upper_bound, fill = as.factor(group)),
alpha = 0.2, color = NA) + # Shaded area for CI
scale_color_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) + # Custom colors
scale_fill_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) +
labs(title = "Probability of experiencing a first event",
x = "Time (days)",
y = "Probability",
color = "Treatment",
fill = "Treatment") +
theme_minimal() +
theme(legend.position = "top")
Lovely innit? But what about the Risk table that's usually plotted below the curve?
I know that the risk table is hidden in each adjustedsurv()
object, due to the possibility of plotting the risk_table in its follow-up function plot.adjustedsurv()
, but I can't find it nor do I know how to pool these 50 graphical risk tables into one graphical risk table.
I am also thinking of creating at_risk tables myself using data.table
by calculating probability at time t * size of each treatment arm (again the strata is on treatment). Could the gt
package be of help here? I cannot find anything in gt
regarding risk tables or longitudinal data in wide-format.
1 Answer
Reset to default 1Both multiple imputation and the subsequent plotting of risk tables are supported directly in adjustedsurv()
, so you really don't have to do all of that. You can do this instead:
- 1.) Fit one Cox-model per imputed dataset using the
with()
function as described in themice
package. - 2.) Supply the
mids
object including the multiple imputed datasets directly to thedata
argument ofadjustedsurv()
and themira
object from step 1.) to theoutcome_model
argument. - 3.) Just call
plot()
on the output object, it will directly perform the pooling for both the survival curves themselves and the risk table.
A small example:
library(adjustedCurves)
library(survival)
library(mice)
set.seed(42)
# simulate some example data with missing values in x1
sim_dat <- sim_confounded_surv(n=250, max_t=1.2)
sim_dat$group <- as.factor(sim_dat$group)
sim_dat$x1 <- ifelse(runif(n=250) < 0.5, sim_dat$x1, NA)
# perform multiple imputation
mids <- mice(data=sim_dat, method="pmm", m=50, printFlag=FALSE)
# fit a cox model for each dataset
mira <- with(mids, coxph(Surv(time, event) ~ x1 + x2 + x4 + x5 + group, x=TRUE))
# estimate and plot adjusted survival curves
adj <- adjustedsurv(data=mids,
variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=mira)
plot(adj, risk_table=TRUE)
I have two further comments on your analysis. First, if you want to combine IPt weights (or propensity scores) with an outcome model, you should probably use method="aiptw"
or method="aiptw_pseudo"
instead. In particular, using a Cox model fitted with weights in the way you did is not doubly-robust and may lead to bias, even if either the model or the weights were correctly estimated. This is described in the associated article (https://doi.org/10.1002/sim.9681).
Secondly, risk tables may be misleading when using methods other than "km"
, "iptw_km"
or "iptw_cox"
, because all methods other than these 3 do not rely (soley) on the numbers displayed in the table to estimate the adjusted survival curves. See ?plot.adjustedsurv
for a more detailed discussion.
Also, just as a side-note, the risk-table is not actually included in the output of the adjustedsurv()
function. It is calculated only when plot.adjustedsurv()
is called, using the information included in the output of adjustedsurv()
(which is why the output includes the input data
etc.).
EDIT:
To use the same approach for method="aiptw"
you can simply estimate propensity score models for each imputed dataset:
mira2 <- with(mids, glm(group ~ x1 + x2 + x4 + x5, family="binomial"))
and then supply those to the appropriate argument:
adj <- adjustedsurv(data=mids,
variable="group",
ev_time="time",
event="event",
method="aiptw",
outcome_model=mira,
treatment_model=mira2)
plot(adj, risk_table=TRUE)
The AIPTW method does not use weights directly, it uses the propensity score estimated using the logistic regression models. There is no method implemented in adjustedCurves
(or elsewhere) that uses weights and an outcome model to estimate adjusted survival curves. They all use propensity scores. If you want to use only stabilized weights and are willing to skip the Cox model, you could of course do so using either method="iptw_km"
, method="iptw_cox"
or method="iptw_pseudo"
. The documentation of the package includes explanations and examples for all of this.
本文标签: ggplot2Pool 50 risk tables from the Adjusted Curves package in RStack Overflow
版权声明:本文标题:ggplot2 - Pool 50 risk tables from the Adjusted Curves package in R - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736309148a1933914.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论