admin管理员组

文章数量:1122846

After performing multiple imputation using R's mice package, I have 50 datasets with time-to-event data for two treatments: drug A & drug B. I performed a Cox proportional hazards model coxph() with inverse probability of treatment weights (IPtW) on each of the dataset, to retrieve the Hazards Ratio (HR) and for survival analysis (KM curve). For the adjusted KM curve I use the adjustedCurves R package, and store the data ($adj) of each of the 50 adjustedsurv() objects in a list.

Edit: Note that I can only calculate my IPt weights after multiple imputation: one of the variables (score) that contained NA values is used to calculate the weights:

library(ggplot2)
library(MASS)
library(data.table)
library(adjustedCurves)
library(survival)
library(survminer)

# datasets is a list containing 50 imputed datasets 
for (i in 1:length(datasets)){
    # create IPtW denominator Pr[A|L] using binomial model (treat=0 or 1)
    ipw.denom <- glm(treatment  ~ period + score + age + sex,
          family=binomial(link="logit"),
          data=datasets[[i]])

    # create IPtW numerator Pr[A] using binomial model (treat=0 or 1)
    ipw.num <- glm(treatment  ~ 1,
               family=binomial(link="logit"),
               data=datasets[[i]])

    # predict for each subject
    datasets[[i]]$ipw.num <- predict(ipw.num, datasets[[i]], type="response")
    datasets[[i]]$ipw.denom <- predict(ipw.denom, datasets[[i]], type="response")

    # create stabilized weights for each subject
    #if treatment=1 -> ipw.num/ipw.denom
    #if treatment=0 -> (1-ipw.num)/(1-ipw.denom)
    datasets[[i]]$sw <- ifelse(datasets[[i]]$treatment==1,
                               datasets[[i]]$ipw.num/datasets[[i]]$ipw.denom,
                               (1-datasets[[i]]$ipw.num)/(1-datasets[[i]]$ipw.denom))

    # For each dataset, apply a cox proportional hazards model
    cox_list[[i]] <- coxph(
        formula = Surv(time_to_event, outcome) ~ treatment + as.factor(period) + score + age + sex,
        data = datasets[[i]],  
        weights = datasets[[i]]$sw, x=TRUE)

    # Use the adjustedCurve package to create adjusted KM curves:
    adjsurv <- adjustedsurv(data=datasets[[i]],
                        variable="treatment",
                        ev_time="time_to_event",
                        event="outcome",
                        method="direct",
                        outcome_model=cox_list[[i]],
                        conf_int=TRUE)

    # Store the data, to apply Rubin's rule
    cox_data[[i]] <- adjsurv$adj

Edit: I was notfied by @Denzo that this approach in adjustedsurv() is not doubly robust, and should use method="aiptw" with treatment_model= instead. /edit

Using Rubin's rule I managed to pool the data and capture both the within variance in each dataset as well as the between-each-imputed-dataset variance. And thereby managed to create a data table ready for a KM curve:

#Rubin's rule to calculate CIs for each time point:
# Convert SE to within variance
for (i in 1:length(datasets)){
  cox_data[[i]]$w_var <- cox_data[[i]]$se^2
}

# Combine all data tables into one large data table and use dplyr for the Rubin's Rule steps:
cox_data_comb <- rbindlist(cox_data, idcol = "source_id")

# For the 50 datasets...
result <- cox_data_comb %>%
  group_by(group, time) %>% # ...For each timepoint per treatment...
  mutate(b_var = var(surv)) %>%  # ...take the var of the mean surv prob to get the between 50 datasets var
  mutate(t_var = w_var + (1 + 1/length(datasets)) * b_var) %>%  # ...and calculate the total variance
  mutate(pooled_se = sqrt(t_var)) %>%  # Create pooled SE
  mutate(lower_bound = surv - z_value * pooled_se) %>% # ...and pooled lower CI
  mutate(upper_bound = surv + z_value * pooled_se) %>% # ...and pooled upper CI
  ungroup()

And plot the results in a custom KM curve using ggplot2 and the results table:

ggplot(result, aes(x = time, y = surv, color = as.factor(group), group = group)) +
  geom_line(size = 1) +  # Line for mean values
  # ribbon for CIs:
  geom_ribbon(aes(ymin = lower_bound, ymax = upper_bound, fill = as.factor(group)),
              alpha = 0.2, color = NA) +      # Shaded area for CI
  scale_color_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) + # Custom colors
  scale_fill_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) +
  labs(title = "Probability of experiencing a first event",
       x = "Time (days)",
       y = "Probability",
       color = "Treatment",
       fill = "Treatment") +
  theme_minimal() +
  theme(legend.position = "top")

Lovely innit? But what about the Risk table that's usually plotted below the curve?

I know that the risk table is hidden in each adjustedsurv() object, due to the possibility of plotting the risk_table in its follow-up function plot.adjustedsurv(), but I can't find it nor do I know how to pool these 50 graphical risk tables into one graphical risk table.

I am also thinking of creating at_risk tables myself using data.table by calculating probability at time t * size of each treatment arm (again the strata is on treatment). Could the gt package be of help here? I cannot find anything in gt regarding risk tables or longitudinal data in wide-format.

After performing multiple imputation using R's mice package, I have 50 datasets with time-to-event data for two treatments: drug A & drug B. I performed a Cox proportional hazards model coxph() with inverse probability of treatment weights (IPtW) on each of the dataset, to retrieve the Hazards Ratio (HR) and for survival analysis (KM curve). For the adjusted KM curve I use the adjustedCurves R package, and store the data ($adj) of each of the 50 adjustedsurv() objects in a list.

Edit: Note that I can only calculate my IPt weights after multiple imputation: one of the variables (score) that contained NA values is used to calculate the weights:

library(ggplot2)
library(MASS)
library(data.table)
library(adjustedCurves)
library(survival)
library(survminer)

# datasets is a list containing 50 imputed datasets 
for (i in 1:length(datasets)){
    # create IPtW denominator Pr[A|L] using binomial model (treat=0 or 1)
    ipw.denom <- glm(treatment  ~ period + score + age + sex,
          family=binomial(link="logit"),
          data=datasets[[i]])

    # create IPtW numerator Pr[A] using binomial model (treat=0 or 1)
    ipw.num <- glm(treatment  ~ 1,
               family=binomial(link="logit"),
               data=datasets[[i]])

    # predict for each subject
    datasets[[i]]$ipw.num <- predict(ipw.num, datasets[[i]], type="response")
    datasets[[i]]$ipw.denom <- predict(ipw.denom, datasets[[i]], type="response")

    # create stabilized weights for each subject
    #if treatment=1 -> ipw.num/ipw.denom
    #if treatment=0 -> (1-ipw.num)/(1-ipw.denom)
    datasets[[i]]$sw <- ifelse(datasets[[i]]$treatment==1,
                               datasets[[i]]$ipw.num/datasets[[i]]$ipw.denom,
                               (1-datasets[[i]]$ipw.num)/(1-datasets[[i]]$ipw.denom))

    # For each dataset, apply a cox proportional hazards model
    cox_list[[i]] <- coxph(
        formula = Surv(time_to_event, outcome) ~ treatment + as.factor(period) + score + age + sex,
        data = datasets[[i]],  
        weights = datasets[[i]]$sw, x=TRUE)

    # Use the adjustedCurve package to create adjusted KM curves:
    adjsurv <- adjustedsurv(data=datasets[[i]],
                        variable="treatment",
                        ev_time="time_to_event",
                        event="outcome",
                        method="direct",
                        outcome_model=cox_list[[i]],
                        conf_int=TRUE)

    # Store the data, to apply Rubin's rule
    cox_data[[i]] <- adjsurv$adj

Edit: I was notfied by @Denzo that this approach in adjustedsurv() is not doubly robust, and should use method="aiptw" with treatment_model= instead. /edit

Using Rubin's rule I managed to pool the data and capture both the within variance in each dataset as well as the between-each-imputed-dataset variance. And thereby managed to create a data table ready for a KM curve:

#Rubin's rule to calculate CIs for each time point:
# Convert SE to within variance
for (i in 1:length(datasets)){
  cox_data[[i]]$w_var <- cox_data[[i]]$se^2
}

# Combine all data tables into one large data table and use dplyr for the Rubin's Rule steps:
cox_data_comb <- rbindlist(cox_data, idcol = "source_id")

# For the 50 datasets...
result <- cox_data_comb %>%
  group_by(group, time) %>% # ...For each timepoint per treatment...
  mutate(b_var = var(surv)) %>%  # ...take the var of the mean surv prob to get the between 50 datasets var
  mutate(t_var = w_var + (1 + 1/length(datasets)) * b_var) %>%  # ...and calculate the total variance
  mutate(pooled_se = sqrt(t_var)) %>%  # Create pooled SE
  mutate(lower_bound = surv - z_value * pooled_se) %>% # ...and pooled lower CI
  mutate(upper_bound = surv + z_value * pooled_se) %>% # ...and pooled upper CI
  ungroup()

And plot the results in a custom KM curve using ggplot2 and the results table:

ggplot(result, aes(x = time, y = surv, color = as.factor(group), group = group)) +
  geom_line(size = 1) +  # Line for mean values
  # ribbon for CIs:
  geom_ribbon(aes(ymin = lower_bound, ymax = upper_bound, fill = as.factor(group)),
              alpha = 0.2, color = NA) +      # Shaded area for CI
  scale_color_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) + # Custom colors
  scale_fill_manual(values = c("blue", "red"), name = "Treatment", labels=c('drug A', 'drug B')) +
  labs(title = "Probability of experiencing a first event",
       x = "Time (days)",
       y = "Probability",
       color = "Treatment",
       fill = "Treatment") +
  theme_minimal() +
  theme(legend.position = "top")

Lovely innit? But what about the Risk table that's usually plotted below the curve?

I know that the risk table is hidden in each adjustedsurv() object, due to the possibility of plotting the risk_table in its follow-up function plot.adjustedsurv(), but I can't find it nor do I know how to pool these 50 graphical risk tables into one graphical risk table.

I am also thinking of creating at_risk tables myself using data.table by calculating probability at time t * size of each treatment arm (again the strata is on treatment). Could the gt package be of help here? I cannot find anything in gt regarding risk tables or longitudinal data in wide-format.

Share Improve this question edited Nov 26, 2024 at 10:20 Stefan Verweij asked Nov 21, 2024 at 16:00 Stefan VerweijStefan Verweij 1722 silver badges15 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Both multiple imputation and the subsequent plotting of risk tables are supported directly in adjustedsurv(), so you really don't have to do all of that. You can do this instead:

  • 1.) Fit one Cox-model per imputed dataset using the with() function as described in the mice package.
  • 2.) Supply the mids object including the multiple imputed datasets directly to the data argument of adjustedsurv() and the mira object from step 1.) to the outcome_model argument.
  • 3.) Just call plot() on the output object, it will directly perform the pooling for both the survival curves themselves and the risk table.

A small example:

library(adjustedCurves)
library(survival)
library(mice)

set.seed(42)

# simulate some example data with missing values in x1
sim_dat <- sim_confounded_surv(n=250, max_t=1.2)
sim_dat$group <- as.factor(sim_dat$group)
sim_dat$x1 <- ifelse(runif(n=250) < 0.5, sim_dat$x1, NA)

# perform multiple imputation
mids <- mice(data=sim_dat, method="pmm", m=50, printFlag=FALSE)

# fit a cox model for each dataset
mira <- with(mids, coxph(Surv(time, event) ~ x1 + x2 + x4 + x5 + group, x=TRUE))

# estimate and plot adjusted survival curves
adj <- adjustedsurv(data=mids,
                    variable="group",
                    ev_time="time",
                    event="event",
                    method="direct",
                    outcome_model=mira)
plot(adj, risk_table=TRUE)

I have two further comments on your analysis. First, if you want to combine IPt weights (or propensity scores) with an outcome model, you should probably use method="aiptw" or method="aiptw_pseudo" instead. In particular, using a Cox model fitted with weights in the way you did is not doubly-robust and may lead to bias, even if either the model or the weights were correctly estimated. This is described in the associated article (https://doi.org/10.1002/sim.9681).

Secondly, risk tables may be misleading when using methods other than "km", "iptw_km" or "iptw_cox", because all methods other than these 3 do not rely (soley) on the numbers displayed in the table to estimate the adjusted survival curves. See ?plot.adjustedsurv for a more detailed discussion.

Also, just as a side-note, the risk-table is not actually included in the output of the adjustedsurv() function. It is calculated only when plot.adjustedsurv() is called, using the information included in the output of adjustedsurv() (which is why the output includes the input data etc.).

EDIT:

To use the same approach for method="aiptw" you can simply estimate propensity score models for each imputed dataset:

mira2 <- with(mids, glm(group ~ x1 + x2 + x4 + x5, family="binomial"))

and then supply those to the appropriate argument:

adj <- adjustedsurv(data=mids,
                    variable="group",
                    ev_time="time",
                    event="event",
                    method="aiptw",
                    outcome_model=mira,
                    treatment_model=mira2)
plot(adj, risk_table=TRUE)

The AIPTW method does not use weights directly, it uses the propensity score estimated using the logistic regression models. There is no method implemented in adjustedCurves (or elsewhere) that uses weights and an outcome model to estimate adjusted survival curves. They all use propensity scores. If you want to use only stabilized weights and are willing to skip the Cox model, you could of course do so using either method="iptw_km", method="iptw_cox" or method="iptw_pseudo". The documentation of the package includes explanations and examples for all of this.

本文标签: ggplot2Pool 50 risk tables from the Adjusted Curves package in RStack Overflow