--- title: "Repeated Cross-fitting" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Repated Crossfitting} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6 ) ``` Contents: * [Repeated Crossfitting](#repfit) + [Create an AIPW object](#constructor) + [Decorate with Repeated](#decorator) * [More Repeatations vs More K-split?](#whichbetter) ## Repeated Cross-fitting The purpose of repeated cross-fitting is to reduce the variability of estimate based on a specific split of data by summarizing estimates using different splits as suggested by Chernozhukov (2018). ### Create an AIPW object ```{r one_line} library(AIPW) library(SuperLearner) library(ggplot2) set.seed(123) data("eager_sim_obs") cov = c("eligibility","loss_num","age", "time_try_pregnant","BMI","meanAP") AIPW_SL <- AIPW$new(Y= eager_sim_obs$sim_Y, A= eager_sim_obs$sim_A, W= subset(eager_sim_obs,select=cov), Q.SL.library = c("SL.glm"), g.SL.library = c("SL.glm"), k_split = 2, verbose=TRUE)$ fit()$ summary() ``` ### Decorate with `Repeated` class ```{r refit} # Create a new object from the previous AIPW_SL (Repeated class is an extension of the AIPW class) repeated_aipw_sl <- Repeated$new(aipw_obj = AIPW_SL) # Fit repetitively repeated_aipw_sl$repfit(num_reps = 30, stratified = F) # Summarise the median estimate, median SE, and the SE of median estimate adjusting for `num_reps` repetitions repeated_aipw_sl$summary_median() ``` ```{r check refit} # Check the distributions of estiamtes from `num_reps` repetitions s <- repeated_aipw_sl$repeated_estimates ggplot2::ggplot(ggplot2::aes(x=Estimate),data = s) + ggplot2::geom_histogram(bins = 10) + ggplot2::facet_grid(~Estimand, scales = "free") ggplot2::ggplot(ggplot2::aes(x=SE),data = s) + ggplot2::geom_histogram(bins = 10) + ggplot2::facet_grid(~Estimand, scales = "free") ``` ### More `num_reps` vs More `k-split`? There are several considerations: 1. Computational resources 2. Sample size 3. Complexity of the SuperLearner algorithms ### References: Chernozhukov V, Chetverikov V, Demirer M, et al (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal.