Title: | Augmented Inverse Probability Weighting |
---|---|
Description: | The 'AIPW' package implements the augmented inverse probability weighting, a doubly robust estimator, for average causal effect estimation with user-defined stacked machine learning algorithms. To cite the 'AIPW' package, please use: "Yongqi Zhong, Edward H. Kennedy, Lisa M. Bodnar, Ashley I. Naimi (2021). AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology. doi: 10.1093/aje/kwab207". Visit: <https://yqzhong7.github.io/AIPW/> for more information. |
Authors: | Yongqi Zhong [aut, cre] |
Maintainer: | Yongqi Zhong <[email protected]> |
License: | GPL-3 |
Version: | 0.6.9.1 |
Built: | 2025-03-02 05:20:41 UTC |
Source: | https://github.com/yqzhong7/aipw |
An R6Class of AIPW for estimating the average causal effects with users' inputs of exposure, outcome, covariates and related libraries for estimating the efficient influence function.
An AIPW object is constructed by new()
with users' inputs of data and causal structures, then it fit()
the data using the
libraries in Q.SL.library
and g.SL.library
with k_split
cross-fitting, and provides results via the summary()
method.
After using fit()
and/or summary()
methods, propensity scores and inverse probability weights by exposure status can be
examined with plot.p_score()
and plot.ip_weights()
, respectively.
If outcome is missing, analysis assumes missing at random (MAR) by estimating propensity scores of I(A=a, observed=1) with all covariates W
.
(W.Q
and W.g
are disabled.) Missing exposure is not supported.
See examples for illustration.
AIPW
object
AIPW$new(Y = NULL, A = NULL, W = NULL, W.Q = NULL, W.g = NULL, Q.SL.library = NULL, g.SL.library = NULL, k_split = 10, verbose = TRUE, save.sl.fit = FALSE)
Argument | Type | Details |
Y |
Integer | A vector of outcome (binary (0, 1) or continuous) |
A |
Integer | A vector of binary exposure (0 or 1) |
W |
Data | Covariates for both exposure and outcome models. |
W.Q |
Data | Covariates for the outcome model (Q). |
W.g |
Data | Covariates for the exposure model (g). |
Q.SL.library |
SL.library | Algorithms used for the outcome model (Q). |
g.SL.library |
SL.library | Algorithms used for the exposure model (g). |
k_split |
Integer | Number of folds for splitting (Default = 10). |
verbose |
Logical | Whether to print the result (Default = TRUE) |
save.sl.fit |
Logical | Whether to save Q.fit and g.fit (Default = FALSE) |
W
, W.Q
& W.g
It can be a vector, matrix or data.frame. If and only if W == NULL
, W
would be replaced by W.Q
and W.g
.
Q.SL.library
& g.SL.library
Machine learning algorithms from SuperLearner libraries or sl3
learner object (Lrnr_base)
k_split
It ranges from 1 to number of observation-1.
If k_split=1, no cross-fitting; if k_split>=2, cross-fitting is used
(e.g., k_split=10
, use 9/10 of the data to estimate and the remaining 1/10 leftover to predict).
NOTE: it's recommended to use cross-fitting.
save.sl.fit
This option allows users to save the fitted sl object (libs$Q.fit & libs$g.fit) for debug use. Warning: Saving the SuperLearner fitted object may cause a substantive storage/memory use.
Methods | Details | Link |
fit() |
Fit the data to the AIPW object | fit.AIPW |
stratified_fit()
|
Fit the data to the AIPW object stratified by A |
stratified_fit.AIPW |
summary() |
Summary of the average treatment effects from AIPW | summary.AIPW_base |
plot.p_score() |
Plot the propensity scores by exposure status | plot.p_score |
plot.ip_weights() |
Plot the inverse probability weights using truncated propensity scores | plot.ip_weights |
Variable | Generated by | Return |
n |
Constructor | Number of observations |
stratified_fitted |
stratified_fit() |
Fit the outcome model stratified by exposure status |
obs_est |
fit() & summary() |
Components calculating average causal effects |
estimates |
summary() |
A list of Risk difference, risk ratio, odds ratio |
result |
summary() |
A matrix contains RD, ATT, ATC, RR and OR with their SE and 95%CI |
g.plot |
plot.p_score() |
A density plot of propensity scores by exposure status |
ip_weights.plot |
plot.ip_weights() |
A box plot of inverse probability weights |
libs |
fit() |
SuperLearner or sl3 libraries and their fitted objects |
sl.fit |
Constructor | A wrapper function for fitting SuperLearner or sl3 |
sl.predict |
Constructor | A wrapper function using sl.fit to predict |
stratified_fit
An indicator for whether the outcome model is fitted stratified by exposure status in the fit()
method.
Only when using stratified_fit()
to turn on stratified_fit = TRUE
, summary
outputs average treatment effects among the treated and the controls.
obs_est
After using fit()
and summary()
methods, this list contains the propensity scores (p_score
),
counterfactual predictions (mu
, mu1
& mu0
) and
efficient influence functions (aipw_eif1
& aipw_eif0
) for later average treatment effect calculations.
g.plot
This plot is generated by ggplot2::geom_density
ip_weights.plot
This plot uses truncated propensity scores stratified by exposure status (ggplot2::geom_boxplot
)
Zhong Y, Kennedy EH, Bodnar LM, Naimi AI (2021). AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology.
Robins JM, Rotnitzky A (1995). Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association.
Chernozhukov V, Chetverikov V, Demirer M, et al (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal.
Kennedy EH, Sjolander A, Small DS (2015). Semiparametric causal inference in matched cohort studies. Biometrika.
library(SuperLearner) library(ggplot2) #create an object aipw_sl <- AIPW$new(Y=rbinom(100,1,0.5), A=rbinom(100,1,0.5), W.Q=rbinom(100,1,0.5), W.g=rbinom(100,1,0.5), Q.SL.library="SL.mean",g.SL.library="SL.mean", k_split=1,verbose=FALSE) #fit the object aipw_sl$fit() # or use `aipw_sl$stratified_fit()` to estimate ATE and ATT/ATC #calculate the results aipw_sl$summary(g.bound = 0.025) #check the propensity scores by exposure status after truncation aipw_sl$plot.p_score()
library(SuperLearner) library(ggplot2) #create an object aipw_sl <- AIPW$new(Y=rbinom(100,1,0.5), A=rbinom(100,1,0.5), W.Q=rbinom(100,1,0.5), W.g=rbinom(100,1,0.5), Q.SL.library="SL.mean",g.SL.library="SL.mean", k_split=1,verbose=FALSE) #fit the object aipw_sl$fit() # or use `aipw_sl$stratified_fit()` to estimate ATE and ATT/ATC #calculate the results aipw_sl$summary(g.bound = 0.025) #check the propensity scores by exposure status after truncation aipw_sl$plot.p_score()
AIPW_nuis
class for users to manually input nuisance functions (estimates from the exposure and the outcome models)
Create an AIPW_nuis object that uses users' input nuisance functions from the exposure model ,
and the outcome models
and
:
Note: If outcome is missing, replace (A=a) with (A=a, observed=1) when estimating the propensity scores.
AIPW_nuis
object
AIPW$new(Y = NULL, A = NULL, tmle_fit = NULL, verbose = TRUE)
Argument | Type | Details |
Y |
Integer | A vector of outcome (binary (0, 1) or continuous) |
A |
Integer | A vector of binary exposure (0 or 1) |
mu0 |
Numeric | User input of |
mu1 |
Numeric | User input of |
raw_p_score |
Numeric | User input of |
verbose |
Logical | Whether to print the result (Default = TRUE) |
stratified_fitted |
Logical | Whether mu0 & mu1 was estimated only using A=0 & A=1 (Default = FALSE) |
Methods | Details | Link |
summary() |
Summary of the average treatment effects from AIPW | summary.AIPW_base |
plot.p_score() |
Plot the propensity scores by exposure status | plot.p_score |
plot.ip_weights() |
Plot the inverse probability weights using truncated propensity scores | plot.ip_weights |
Variable | Generated by | Return |
n |
Constructor | Number of observations |
obs_est |
Constructor | Components calculating average causal effects |
estimates |
summary() |
A list of Risk difference, risk ratio, odds ratio |
result |
summary() |
A matrix contains RD, ATT, ATC, RR and OR with their SE and 95%CI |
g.plot |
plot.p_score() |
A density plot of propensity scores by exposure status |
ip_weights.plot |
plot.ip_weights() |
A box plot of inverse probability weights |
stratified_fit
An indicator for whether the outcome model is fitted stratified by exposure status in thefit()
method.
Only when using stratified_fit()
to turn on stratified_fit = TRUE
, summary
outputs average treatment effects among the treated and the controls.
obs_est
This list includes propensity scores (p_score
), counterfactual predictions (mu
, mu1
& mu0
) and efficient influence functions (aipw_eif1
& aipw_eif0
)
g.plot
This plot is generated by ggplot2::geom_density
ip_weights.plot
This plot uses truncated propensity scores stratified by exposure status (ggplot2::geom_boxplot
)
AIPW_tmle
class uses a fitted tmle
or tmle3
object as input
Create an AIPW_tmle object that uses the estimated efficient influence function from a fitted tmle
or tmle3
object
AIPW_tmle
object
AIPW$new(Y = NULL, A = NULL, tmle_fit = NULL, verbose = TRUE)
Argument | Type | Details |
Y |
Integer | A vector of outcome (binary (0, 1) or continuous) |
A |
Integer | A vector of binary exposure (0 or 1) |
tmle_fit |
Object | A fitted tmle or tmle3 object |
verbose |
Logical | Whether to print the result (Default = TRUE) |
Methods | Details | Link |
summary() |
Summary of the average treatment effects from AIPW | summary.AIPW_base |
plot.p_score() |
Plot the propensity scores by exposure status | plot.p_score |
plot.ip_weights() |
Plot the inverse probability weights using truncated propensity scores | plot.ip_weights |
Variable | Generated by | Return |
n |
Constructor | Number of observations |
obs_est |
Constructor | Components calculating average causal effects |
estimates |
summary() |
A list of Risk difference, risk ratio, odds ratio |
result |
summary() |
A matrix contains RD, ATT, ATC, RR and OR with their SE and 95%CI |
g.plot |
plot.p_score() |
A density plot of propensity scores by exposure status |
ip_weights.plot |
plot.ip_weights() |
A box plot of inverse probability weights |
obs_est
This list extracts from the fitted tmle
or tmle3
object.
It includes propensity scores (p_score
), counterfactual predictions (mu
, mu1
& mu0
) and efficient influence functions (aipw_eif1
& aipw_eif0
)
g.plot
This plot is generated by ggplot2::geom_density
ip_weights.plot
This plot uses truncated propensity scores stratified by exposure status (ggplot2::geom_boxplot
)
## Not run: vec <- function() sample(0:1,100,replace = TRUE) df <- data.frame(replicate(4,vec())) names(df) <- c("A","Y","W1","W2") ## From tmle library(tmle) library(SuperLearner) tmle_fit <- tmle(Y=df$Y,A=df$A,W=subset(df,select=c("W1","W2")), Q.SL.library="SL.glm", g.SL.library="SL.glm", family="binomial") AIPW_tmle$new(A=df$A,Y=df$Y,tmle_fit = tmle_fit,verbose = TRUE)$summary() ## From tmle3 # tmle3 simple implementation library(tmle3) library(sl3) node_list <- list(A = "A",Y = "Y",W = c("W1","W2")) or_spec <- tmle_OR(baseline_level = "0",contrast_level = "1") tmle_task <- or_spec$make_tmle_task(df,node_list) lrnr_glm <- make_learner(Lrnr_glm) sl <- Lrnr_sl$new(learners = list(lrnr_glm)) learner_list <- list(A = sl, Y = sl) tmle3_fit <- tmle3(or_spec, data=df, node_list, learner_list) # parse tmle3_fit into AIPW_tmle class AIPW_tmle$new(A=df$A,Y=df$Y,tmle_fit = tmle3_fit,verbose = TRUE)$summary() ## End(Not run)
## Not run: vec <- function() sample(0:1,100,replace = TRUE) df <- data.frame(replicate(4,vec())) names(df) <- c("A","Y","W1","W2") ## From tmle library(tmle) library(SuperLearner) tmle_fit <- tmle(Y=df$Y,A=df$A,W=subset(df,select=c("W1","W2")), Q.SL.library="SL.glm", g.SL.library="SL.glm", family="binomial") AIPW_tmle$new(A=df$A,Y=df$Y,tmle_fit = tmle_fit,verbose = TRUE)$summary() ## From tmle3 # tmle3 simple implementation library(tmle3) library(sl3) node_list <- list(A = "A",Y = "Y",W = c("W1","W2")) or_spec <- tmle_OR(baseline_level = "0",contrast_level = "1") tmle_task <- or_spec$make_tmle_task(df,node_list) lrnr_glm <- make_learner(Lrnr_glm) sl <- Lrnr_sl$new(learners = list(lrnr_glm)) learner_list <- list(A = sl, Y = sl) tmle3_fit <- tmle3(or_spec, data=df, node_list, learner_list) # parse tmle3_fit into AIPW_tmle class AIPW_tmle$new(A=df$A,Y=df$Y,tmle_fit = tmle3_fit,verbose = TRUE)$summary() ## End(Not run)
A wrapper function for AIPW$new()$fit()$summary()
aipw_wrapper( Y, A, verbose = TRUE, W = NULL, W.Q = NULL, W.g = NULL, Q.SL.library, g.SL.library, k_split = 10, g.bound = 0.025, stratified_fit = FALSE )
aipw_wrapper( Y, A, verbose = TRUE, W = NULL, W.Q = NULL, W.g = NULL, Q.SL.library, g.SL.library, k_split = 10, g.bound = 0.025, stratified_fit = FALSE )
Y |
Outcome (binary integer: 0 or 1) |
A |
Exposure (binary integer: 0 or 1) |
verbose |
Whether to print the result (logical; Default = FALSE) |
W |
covariates for both exposure and outcome models (vector, matrix or data.frame). If null, this function will seek for
inputs from |
W.Q |
Only valid when |
W.g |
Only valid when |
Q.SL.library |
SuperLearner libraries or sl3 learner object (Lrnr_base) for outcome model |
g.SL.library |
SuperLearner libraries or sl3 learner object (Lrnr_base) for exposure model |
k_split |
Number of splitting (integer; range: from 1 to number of observation-1):
if k_split=1, no cross-fitting;
if k_split>=2, cross-fitting is used
(e.g., |
g.bound |
Value between [0,1] at which the propensity score should be truncated. Defaults to 0.025. |
stratified_fit |
An indicator for whether the outcome model is fitted stratified by exposure status in the |
A fitted AIPW
object with summarised results
library(SuperLearner) aipw_sl <- aipw_wrapper(Y=rbinom(100,1,0.5), A=rbinom(100,1,0.5), W.Q=rbinom(100,1,0.5), W.g=rbinom(100,1,0.5), Q.SL.library="SL.mean",g.SL.library="SL.mean", k_split=1,verbose=FALSE)
library(SuperLearner) aipw_sl <- aipw_wrapper(Y=rbinom(100,1,0.5), A=rbinom(100,1,0.5), W.Q=rbinom(100,1,0.5), W.g=rbinom(100,1,0.5), Q.SL.library="SL.mean",g.SL.library="SL.mean", k_split=1,verbose=FALSE)
Datasets were simulated using baseline covariates (sampling with replacement) from the Effects of Aspirin in Gestation and Reproduction (EAGeR) study. Data generating mechanisms were described in our manuscript (Zhong et al. (inpreparation), Am. J. Epidemiol.). True marginal causal effects on risk difference, log risk ratio and log odds ratio scales were attached to the dataset attributes (true_rd, true_logrr,true_logor).
data(eager_sim_obs)
data(eager_sim_obs)
An object of class data.frame with 200 rows and 8 columns:
binary, simulated outcome which is condition on all other covariates in the dataset
binary, simulated exposure which is conditon on all other covarites expect sim_Y.
binary, indicator of the eligibility stratum
count, number of prior pregnancy losses
continuous, age in years
count, months of conception attempts prior to randomization
continuous, body mass index
continuous, mean arterial blood pressure
Schisterman, E.F., Silver, R.M., Lesher, L.L., Faraggi, D., Wactawski-Wende, J., Townsend, J.M., Lynch, A.M., Perkins, N.J., Mumford, S.L. and Galai, N., 2014. Preconception low-dose aspirin and pregnancy outcomes: results from the EAGeR randomised trial. The Lancet, 384(9937), pp.29-36.
Zhong, Y., Naimi, A.I., Kennedy, E.H., (In preparation). AIPW: An R package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology
Datasets were simulated using baseline covariates (sampling with replacement) from the Effects of Aspirin in Gestation and Reproduction (EAGeR) study.
data(eager_sim_rct)
data(eager_sim_rct)
An object of class data.frame with 1228 rows and 8 columns:
binary, simulated outcome which is condition on all other covariates in the dataset
binary, simulated treatment which is condition on eligibility only.
binary, indicator of the eligibility stratum
count, number of prior pregnancy losses
continuous, age in years
count, months of conception attempts prior to randomization
continuous, body mass index
continuous, mean arterial blood pressure
Schisterman, E.F., Silver, R.M., Lesher, L.L., Faraggi, D., Wactawski-Wende, J., Townsend, J.M., Lynch, A.M., Perkins, N.J., Mumford, S.L. and Galai, N., 2014. Preconception low-dose aspirin and pregnancy outcomes: results from the EAGeR randomised trial. The Lancet, 384(9937), pp.29-36.
Zhong, Y., Naimi, A.I., Kennedy, E.H., (In preparation). AIPW: An R package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology
Fitting the data into the AIPW object with/without cross-fitting to estimate the efficient influence functions
A fitted AIPW object with obs_est
and libs
(public variables)
$fit()
Plot and check the balance of propensity scores by exposure status
ip_weights.plot
(public variable): A box plot of inverse probability weights using truncated propensity scores by exposure status (ggplot2::geom_boxplot
)
$plot.ip_weights()
Plot and check the balance of propensity scores by exposure status
g.plot
(public variable): A density plot of propensity scores by exposure status (ggplot2::geom_density
)
$plot.p_plot()
An R6Class that allows repeated crossfitting procedure for an AIPW object
See examples for illustration.
AIPW
object
Repeated$new(aipw_obj = NULL)
Argument | Type | Details |
aipw_obj |
AIPW object | an AIPW object |
Methods | Details | Link |
repfit() |
Fit the data to the AIPW object num_reps times |
repfit.Repeated |
summary_median() |
Summary (median) of estimates from the repfit() |
summary_median.Repeated |
Variable | Generated by | Return |
repeated_estimates |
repfit() |
A data.frame of estiamtes form num_reps cross-fitting |
repeated_results |
summary_median() |
A list of sumarised estimates |
result |
summary_median() |
A data.frame of sumarised estimates |
repeated_estimates
Estimates from num_reps
cross-fitting.
result
Summarised estimates from “repeated_estimates' using median methods.
Zhong Y, Kennedy EH, Bodnar LM, Naimi AI (2021). AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology.
Robins JM, Rotnitzky A (1995). Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association.
Chernozhukov V, Chetverikov V, Demirer M, et al (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal.
Kennedy EH, Sjolander A, Small DS (2015). Semiparametric causal inference in matched cohort studies. Biometrika.
library(SuperLearner) library(ggplot2) #create an object aipw_sl <- AIPW$new(Y=rbinom(100,1,0.5), A=rbinom(100,1,0.5), W.Q=rbinom(100,1,0.5), W.g=rbinom(100,1,0.5), Q.SL.library="SL.mean",g.SL.library="SL.mean", k_split=2,verbose=FALSE) #create a repeated crossfitting object from the previous step repeated_aipw_sl <- Repeated$new(aipw_sl) #fit repetitively (stratified = TRUE will use stratified_fit() method in AIPW class) repeated_aipw_sl$repfit(num_reps = 3, stratified = FALSE) #summarise the results repeated_aipw_sl$summary_median()
library(SuperLearner) library(ggplot2) #create an object aipw_sl <- AIPW$new(Y=rbinom(100,1,0.5), A=rbinom(100,1,0.5), W.Q=rbinom(100,1,0.5), W.g=rbinom(100,1,0.5), Q.SL.library="SL.mean",g.SL.library="SL.mean", k_split=2,verbose=FALSE) #create a repeated crossfitting object from the previous step repeated_aipw_sl <- Repeated$new(aipw_sl) #fit repetitively (stratified = TRUE will use stratified_fit() method in AIPW class) repeated_aipw_sl$repfit(num_reps = 3, stratified = FALSE) #summarise the results repeated_aipw_sl$summary_median()
Fitting the data into the AIPW object with cross-fitting repeatedly to obtain multiple estimates from repetitions to avoid randomness due to splits in cross-fitting
num_reps |
Integer. Number of repetition of cross-fitting procedures ( |
stratified |
Boolean. |
A Repeated object with repeated_estimates
(estimates
from num_reps times repetition)
$repfit(num_reps = 20, stratified = FALSE)
Chernozhukov V, Chetverikov V, Demirer M, et al (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal.
A
for the outcome modelFitting the data into the AIPW object with/without cross-fitting to estimate the efficient influence functions.
Outcome model is fitted, stratified by exposure status A
A fitted AIPW object with obs_est
and libs
(public variables)
$stratified_fit.AIPW()
Calculate average causal effects in RD, RR and OR in the fitted AIPW or AIPW_tmle object using the estimated efficient influence functions
g.bound |
Value between [0,1] at which the propensity score should be truncated.
Propensity score will be truncated to |
estimates
and result
(public variables): Risks, Average treatment effect in RD, RR and OR.
$summary(g.bound = 0.025)
$summary(g.bound = c(0.025,0.975))
repeated_estimates
from repfit()
in the Repeated object using median methods.From repeated_estimates
, calculate the median estimate (median(Estimates)
), median SE (median(SE)
), SE adjusting for variations across num_reps
times,
and 95% CI using SE adjusting for SE adjusted for variability.
repeated_results
and result
(public variables).
$summary_median.Repeated()
Chernozhukov V, Chetverikov V, Demirer M, et al (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal.