admin管理员组

文章数量:1122832

I am using the tidysynth package in R to create a synthetic control model. I am confused on how to determine what the minimum lift needs to be so that we can detect significance.

It's a pretty roundabout way of measuring lift and significance at the moment. Here is the methodology:

Run Synthetic Control:

synth_data <- weekly_data %>%
  synthetic_control(outcome = thirty_day_retention_rate, 
                    unit = unit_city, 
                    time = week, 
                    i_unit = "city_c",  # Treated city
                    i_time = as.Date("2024-10-06"),  # Intervention date
                    generate_placebos = TRUE) %>%
  generate_predictor(time_window = as.Date(c("2023-06-18", "2024-09-29")),
                     Pre_Treatment_Retention = mean(thirty_day_retention_rate)) %>%  # Use outcome variable as predictor
  generate_weights(optimization_window = as.Date(c("2023-06-18", "2024-09-29"))) %>%
  generate_control()

From here we will have significance like this:

synth_data %>% grab_significance()

unit_name   type    pre_mspe    post_mspe   mspe_ratio  rank    fishers_exact_pvalue    z_score
<chr>       <chr>   <dbl>   <dbl>   <dbl>   <int>   <dbl>   <dbl>
city_a      Donor   6.672489e-05    3.428673e-05    0.51385216  1   0.1428571   1.9618496
city_b      Donor   1.929205e-04    4.786515e-05    0.24810811  2   0.2857143   0.4036121
city_c      Treated 3.999151e-05    9.175422e-06    0.22943424  3   0.4285714   0.2941146
city_d      Donor   1.353589e-03    1.409032e-04    0.10409606  4   0.5714286   -0.4408281
city_e      Donor   1.378489e-04    1.201831e-05    0.08718464  5   0.7142857   -0.5399912
city_f      Donor   1.475206e-03    6.786372e-05    0.04600289  6   0.8571429   -0.7814677
city_g      Donor   2.151765e-04    5.648495e-06    0.02625052  7   1.0000000   -0.8972893

To determine significance I am looking at the z-score in this case 0.2941146 and I see it is less than 1.96 so I will say it is not significant.

Then to determine how much of a lift there was I am comparing the synthetic control vs. observed like below:

post_treatment_lift <- synth_data %>%
  grab_synthetic_control() %>%
  filter(time_unit >= as.Date("2024-10-06")) %>%  # Filter for post-treatment period
  mutate(
    lift = real_y - synth_y,                     # Calculate absolute lift
    relative_lift = (lift / synth_y) * 100       # Calculate relative lift (%)
  ) %>%
  select(time_unit, real_y, synth_y, lift, relative_lift)  # Keep relevant columns
avg_relative_lift <- mean(post_treatment_lift$relative_lift)


-0.166345

So my interpretation is that the relative lift is -0.16ppts with a z-score of 0.29 which is equivalent to 23% confidence.

Based on this approach since it's kind of two separate methods, what would be the best way to determine the lift so that we have 95% confidence?

本文标签: rFigure out TidySynth Minimal Detectable EffectStack Overflow