--- title: "Getting Started with AIPW" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with AIPW} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, fig.width = 6, comment = "#>"#, # cache=TRUE ) ``` Contents: * [Installation](#Installation) * [One-line version](#one_line) * [Longer version](#details) + [Create an AIPW object](#constructor) + [Fit the object](#fit) + [Calculate average treatment effects](#ate) + [Calculate average treatment effects among the treated](#att) * [Parallelization](#par) * [Using tmle/tmle3 as input](#tmle_input) + [tmle](#tmle) + [tmle3](#tmle3) ## Installation 1. Install AIPW from CRAN or [GitHub](https://github.com/yqzhong7/AIPW) ```{r, eval = FALSE} # CRAN version install.packages("AIPW") # github version # install.packages("remotes") # remotes::install_github("yqzhong7/AIPW") ``` __* CRAN version only supports SuperLearner and tmle. Please install the Github version (master branch) to use sl3 and tmle3.__ 2. Install [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) or [sl3](https://tlverse.org/sl3/articles/intro_sl3.html) ```{r, eval = FALSE} #SuperLearner install.packages("SuperLearner") #sl3 remotes::install_github("tlverse/sl3") install.packages("Rsolnp") ``` ## Input data for analyses ```{r example data} library(AIPW) library(SuperLearner) library(ggplot2) set.seed(123) data("eager_sim_obs") cov = c("eligibility","loss_num","age", "time_try_pregnant","BMI","meanAP") ``` ## Using AIPW to estimate the average treatment effect ### One line version (Method chaining from R6class) Using native AIPW class allows users to define different covariate sets for the exposure and the outcome models, respectively. ```{r one_line} AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 10, verbose=FALSE)$ fit()$ #Default truncation summary(g.bound = c(0.025,0.975))$ plot.p_score()$ plot.ip_weights() ``` ### A more detailed tutorial #### 1. Create an AIPW object * ##### Use [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) libraries ```{r SuperLearner, message=FALSE,eval=F} library(AIPW) library(SuperLearner) #SuperLearner libraries for outcome (Q) and exposure models (g) sl.lib <- c("SL.mean","SL.glm") #construct an aipw object for later estimations AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = sl.lib, g.SL.library = sl.lib, k_split = 10, verbose=FALSE) ``` * ##### Use [sl3](https://tlverse.org/sl3/index.html) libraries __New GitHub versions (after v0.6.3.1) no longer support sl3 and tmle3. If you are still interested in using the version with sl3 and tmle3 support, please install `remotes::install_github("yqzhong7/AIPW@aje_version")`__ Metalearner is required to combine the estimates from stacklearner! ```{r sl3, eval=F} library(AIPW) library(sl3) ##construct sl3 learners for outcome (Q) and exposure models (g) lrnr_glm <- Lrnr_glm$new() lrnr_mean <- Lrnr_mean$new() #stacking two learner (this will yield estimates for each learner) stacklearner <- Stack$new(lrnr_glm, lrnr_mean) #metalearner is required to combine the estimates from stacklearner metalearner <- Lrnr_nnls$new() sl3.lib <- Lrnr_sl$new(learners = stacklearner, metalearner = metalearner) #construct an aipw object for later estimations AIPW_sl3 <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = sl3.lib, g.SL.library = sl3.lib, k_split = 10, verbose=FALSE) ``` If outcome is missing, analysis assumes missing at random (MAR) by estimating propensity scores with I(A=a, observed=1). Missing exposure is not supported. #### 2. Fit the AIPW object This step will fit the data stored in the AIPW object to obtain estimates for later average treatment effect calculations. ```{r} #fit the AIPW_SL object AIPW_SL$fit() # or you can use stratified_fit # AIPW_SL$stratified_fit() ``` #### 3. Calculate average treatment effects * ##### Estimate the ATE with propensity scores truncation ```{r} #estimate the average causal effects from the fitted AIPW_SL object AIPW_SL$summary(g.bound = 0.025) #propensity score truncation ``` * ##### Check the balance of propensity scores and inverse probability weights by exposure status after truncation ```{r ps_trunc} library(ggplot2) AIPW_SL$plot.p_score() AIPW_SL$plot.ip_weights() ``` #### 4. Calculate average treatment effects among the treated/controls * ##### `stratified_fit()` fits the outcome model by exposure status while `fit()` does not. Hence, `stratified_fit()` must be used to compute ATT/ATC [(Kennedy et al. 2015)](http://www.ehkennedy.com/uploads/5/8/4/5/58450265/2015_kennedy_et_al_-_semiparametric_causal_inference_in_matched_cohort_studies.pdf) ```{r} suppressWarnings({ AIPW_SL$stratified_fit()$summary() }) ``` ## Parallelization with future.apply In default setting, the `AIPW$fit()` method will be run sequentially. The current version of AIPW package supports parallel processing implemented by [future.apply](https://github.com/HenrikBengtsson/future.apply) package under the [future](https://github.com/HenrikBengtsson/future) framework. Before creating a `AIPW` object, simply use `future::plan()` to enable parallelization and `set.seed()` to take care of the random number generation (RNG) problem: ```{r parallel, eval=FALSE} # install.packages("future.apply") library(future.apply) plan(multiprocess, workers=2, gc=T) set.seed(888) AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = sl3.lib, g.SL.library = sl3.lib, k_split = 10, verbose=FALSE)$fit()$summary() ``` ## Use `tmle`/`tmle3` fitted object as input AIPW shares similar intermediate estimates (nuisance functions) with the Targeted Maximum Likelihood / Minimum Loss-Based Estimation (TMLE). Therefore, `AIPW_tmle` class is designed for using `tmle`/`tmle3` fitted object as input. Details about these two packages can be found [here](https://www.jstatsoft.org/article/view/v051i13) and [here](https://tlverse.org/tlverse-handbook/). This feature is designed for debugging and easy comparisons across these three packages because cross-fitting procedures are different in `tmle` and `tmle3`. In addition, this feature does not support ATT outputs. #### 1. `tmle` As shown in the message, [tmle](https://CRAN.R-project.org/package=tmle) only support cross-fitting in the outcome model. ```{r tmle, eval=F} # install.packages("tmle") library(tmle) library(SuperLearner) tmle_fit <- tmle(Y=eager_sim_obs$sim_Y, A=eager_sim_obs$sim_A, W=eager_sim_obs[,-1:-2], Q.SL.library=c("SL.mean","SL.glm"), g.SL.library=c("SL.mean","SL.glm"), family="binomial", cvQinit = TRUE) cat("\nEstimates from TMLE\n") unlist(tmle_fit$estimates$ATE) unlist(tmle_fit$estimates$RR) unlist(tmle_fit$estimates$OR) cat("\nEstimates from AIPW\n") a_tmle <- AIPW_tmle$ new(A=eager_sim_obs$sim_A,Y=eager_sim_obs$sim_Y,tmle_fit = tmle_fit,verbose = TRUE)$ summary(g.bound=0.025) ``` #### 2. `tmle3` __New GitHub versions (after v0.6.3.1) no longer support sl3 and tmle3. If you are still interested in using the version with sl3 and tmle3 support, please install `remotes::install_github("yqzhong7/AIPW@aje_version")`__ Notably, [tmle3](https://github.com/tlverse/tmle3) conducts cross-fitting and propensity truncation (0.025) by default. ```{r tmle3, eval=F} # remotes::install_github("tlverse/tmle3") library(tmle3,quietly = TRUE) library(sl3,quietly = TRUE) node_list <- list(A = "sim_A",Y = "sim_Y",W = colnames(eager_sim_obs)[-1:-2]) or_spec <- tmle_OR(baseline_level = "0",contrast_level = "1") tmle_task <- or_spec$make_tmle_task(eager_sim_obs,node_list) lrnr_glm <- make_learner(Lrnr_glm) lrnr_mean <- make_learner(Lrnr_mean) sl <- Lrnr_sl$new(learners = list(lrnr_glm,lrnr_mean)) learner_list <- list(A = sl, Y = sl) tmle3_fit <- tmle3(or_spec, data=eager_sim_obs, node_list, learner_list) cat("\nEstimates from TMLE\n") tmle3_fit$summary # parse tmle3_fit into AIPW_tmle class cat("\nEstimates from AIPW\n") a_tmle3<- AIPW_tmle$ new(A=eager_sim_obs$sim_A,Y=eager_sim_obs$sim_Y,tmle_fit = tmle3_fit,verbose = TRUE)$ summary(g.bound=0) ```