# Targeted Learning with Stochastic Treatment Regimes

## Introduction

Stochastic treatment regimes present a relatively simple manner in which to assess the effects of continuous treatments by way of parameters that examine the effects induced by the counterfactual shifting of the observed values of a treatment of interest. Here, we present an implementation of a new algorithm for computing targeted minimum loss-based estimates of treatment shift parameters defined based on a shifting function $$d(A,W)$$. For a technical presentation of the algorithm, the interested reader is invited to consult Dı́az and van der Laan (2018). For additional background on Targeted Learning and previous work on stochastic treatment regimes, please consider consulting van der Laan and Rose (2011), van der Laan and Rose (2018), and Dı́az and van der Laan (2012).

To start, let’s load the packages we’ll use and set a seed for simulation:

library(data.table)
library(haldensify)
## haldensify v0.0.6: Highly Adaptive Lasso Conditional Density Estimation
library(txshift)
## txshift v0.3.4: Efficient Estimation of the Causal Effects of Stochastic Interventions
set.seed(429153)

## Data and Notation

Consider $$n$$ observed units $$O_1, \ldots, O_n$$, where each random variable $$O = (W, A, Y)$$ corresponds to a single observational unit. Let $$W$$ denote baseline covariates (e.g., age, sex, education level), $$A$$ an intervention variable of interest (e.g., nutritional supplements), and $$Y$$ an outcome of interest (e.g., disease status). Though it need not be the case, let $$A$$ be continuous-valued, i.e. $$A \in \mathbb{R}$$. Let $$O_i \sim \mathcal{P} \in \mathcal{M}$$, where $$\mathcal{M}$$ is the nonparametric statistical model defined as the set of continuous densities on $$O$$ with respect to some dominating measure. To formalize the definition of stochastic interventions and their corresponding causal effects, we introduce a nonparametric structural equation model (NPSEM), based on Pearl (2000), to define how the system changes under posited interventions: \begin{align*}\label{eqn:npsem} W &= f_W(U_W) \\ A &= f_A(W, U_A) \\ Y &= f_Y(A, W, U_Y), \end{align*} We denote the observed data structure $$O = (W, A, Y)$$

Letting $$A$$ denote a continuous-valued treatment, we assume that the distribution of $$A$$ conditional on $$W = w$$ has support in the interval $$(l(w), u(w))$$ – for convenience, let this support be a.e. That is, the minimum natural value of treatment $$A$$ for an individual with covariates $$W = w$$ is $$l(w)$$; similarly, the maximum is $$u(w)$$. Then, a simple stochastic intervention, based on a shift $$\delta$$, may be defined $$$\label{eqn:shift} d(a, w) = \begin{cases} a - \delta & \text{if } a > l(w) + \delta \\ a & \text{if } a \leq l(w) + \delta, \end{cases}$$$ where $$0 \leq \delta \leq u(w)$$ is an arbitrary pre-specified value that defines the degree to which the observed value $$A$$ is to be shifted, where possible. For the purpose of using such a shift in practice, the present software provides estimators for a shift function that assumes that the density of treatment $$A$$, conditional on the covariates $$W$$, has support a.e.

### Simulate Data

# simulate simple data for tmle-shift sketch
n_obs <- 1000  # number of observations
n_w <- 1  # number of baseline covariates
tx_mult <- 2  # multiplier for the effect of W = 1 on the treatment

## baseline covariate -- simple, binary
W <- as.numeric(replicate(n_w, rbinom(n_obs, 1, 0.5)))

## create treatment based on baseline W
A <- as.numeric(rnorm(n_obs, mean = tx_mult * W, sd = 1))

# create outcome as a linear function of A, W + white noise
Y <- A + W + rnorm(n_obs, mean = 0, sd = 1)

# shift parameter
delta <- 0.5

## Methodology

### Estimating Stochastic Interventions Effects with Generalized Linear Models

The simplest way to compute the TML estimator for the stochastic treatment regime is to fit each of the major constituent parts (nuisance parameters) of the estimator with generalized linear models. To do this, one may merely call the txshift function, providing the standard inputs listed below.

tmle_hal_shift_1 <- txshift(W = W, A = A, Y = Y, delta = delta,
fluctuation = "standard",
g_fit_args = list(fit_type = "hal",
n_bins = 5,
grid_type = "equal_range",
lambda_seq =
exp(seq(-1, -12,
length = 500))),
Q_fit_args = list(fit_type = "glm",
glm_formula = "Y ~ .")
)
summary(tmle_hal_shift_1)
##     lwr_ci  param_est     upr_ci  param_var   eif_mean  estimator     n_iter
##     1.8971     2.0474     2.1977     0.0059 5.3505e-12       tmle          0

When computing any such TML estimator, we may, of course, vary the regressions used in fitting the nuisance parameters; however, an even simpler variation is to fit the step for the fluctuation submodels with a weighted method, simply weighting each observation by an auxiliary covariate (often denoted $$H_n$$, and sometimes called the “clever covariate” in the literature) rather than using such a covariate directly in the submodel regression fit.

tmle_hal_shift_2 <- txshift(W = W, A = A, Y = Y, delta = delta,
fluctuation = "weighted",
g_fit_args = list(fit_type = "hal",
n_bins = 5,
grid_type = "equal_range",
lambda_seq =
exp(seq(-1, -12,
length = 500))),
Q_fit_args = list(fit_type = "glm",
glm_formula = "Y ~ .")
)
summary(tmle_hal_shift_2)
##      lwr_ci   param_est      upr_ci   param_var    eif_mean   estimator
##       1.892      2.0423      2.1925      0.0059 -5.9731e-17        tmle
##      n_iter
##           0

### Interlude: Constructing Optimal Stacked Regressions with sl3

To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the sl3 R package (Coyle et al. 2020). For a complete guide on using the sl3 R package, consider consulting https://tlverse.org/sl3/ and https://tlverse.org/tlverse-handbook/sl3.html.

# SL learners to be used for most fits (e.g., IPCW, outcome regression)
mean_learner <- Lrnr_mean$new() glm_learner <- Lrnr_glm$new()
rf_learner <- Lrnr_ranger$new() Q_lib <- Stack$new(mean_learner, glm_learner, rf_learner)
sl_learner <- Lrnr_sl$new(learners = Q_lib, metalearner = Lrnr_nnls$new())

# SL learners for fitting the generalized propensity score fit
hse_learner <- make_learner(Lrnr_density_semiparametric,
mean_learner = glm_learner
)
mvd_learner <- make_learner(Lrnr_density_semiparametric,
mean_learner = rf_learner,
var_learner = glm_learner
)
g_lib <- Stack$new(hse_learner, mvd_learner) sl_learner_density <- Lrnr_sl$new(learners = g_lib,
metalearner = Lrnr_solnp_density$new()) ### Estimating Stochastic Interventions Effects with Stacked Regressions Using the framework provided by the sl3 package, the nuisance parameters of the TML estimator may be fit with ensemble learning, using the cross-validation framework of the Super Learner algorithm of van der Laan, Polley, and Hubbard (2007). tmle_sl_shift_1 <- txshift(W = W, A = A, Y = Y, delta = delta, fluctuation = "standard", g_fit_args = list(fit_type = "sl", sl_learners_density = sl_learner_density), Q_fit_args = list(fit_type = "sl", sl_learners = sl_learner) ) summary(tmle_sl_shift_1) As before, we may vary the regression for the submodel fluctuation procedure by weighting each observation by the value of the so-called clever covariate rather than using such an auxiliary covariate directly in the regression procedure: tmle_sl_shift_2 <- txshift(W = W, A = A, Y = Y, delta = delta, fluctuation = "weighted", g_fit_args = list(fit_type = "sl", sl_learners_density = sl_learner_density), Q_fit_args = list(fit_type = "sl", sl_learners = sl_learner) ) summary(tmle_sl_shift_2) ### Statistical Inference for Targeted Maximum Likelihood Estimates Recall that the asymptotic distribution of TML estimators has been studied thoroughly: $\psi_n - \psi_0 = (P_n - P_0) \cdot D(\bar{Q}_n^*, g_n) + R(\hat{P}^*, P_0),$ which, provided the following two conditions: 1. If $$D(\bar{Q}_n^{\star}, g_n)$$ converges to $$D(P_0)$$ in $$L_2(P_0)$$ norm, and 2. the size of the class of functions considered for estimation of $$\bar{Q}_n^{\star}$$ and $$g_n$$ is bounded (technically, $$\exists \mathcal{F}$$ st $$D(\bar{Q}_n^{\star}, g_n) \in \mathcal{F}$$ whp, where $$\mathcal{F}$$ is a Donsker class), readily admits the conclusion that $$\psi_n - \psi_0 = (P_n - P_0) \cdot D(P_0) + R(\hat{P}^{\star}, P_0)$$. Under the additional condition that the remainder term $$R(\hat{P}^{\star}, P_0)$$ decays as $$o_P \left( \frac{1}{\sqrt{n}} \right),$$ we have that $\psi_n - \psi_0 = (P_n - P_0) \cdot D(P_0) + o_P \left( \frac{1}{\sqrt{n}} \right),$ which, by a central limit theorem, establishes a Gaussian limiting distribution for the estimator: $\sqrt{n}(\psi_n - \psi) \to N(0, V(D(P_0))),$ where $$V(D(P_0))$$ is the variance of the efficient influence curve (canonical gradient) when $$\psi$$ admits an asymptotically linear representation. The above implies that $$\psi_n$$ is a $$\sqrt{n}$$-consistent estimator of $$\psi$$, that it is asymptotically normal (as given above), and that it is locally efficient. This allows us to build Wald-type confidence intervals in a straightforward manner: $\psi_n \pm z_{\alpha} \cdot \frac{\sigma_n}{\sqrt{n}},$ where $$\sigma_n^2$$ is an estimator of $$V(D(P_0))$$. The estimator $$\sigma_n^2$$ may be obtained using the bootstrap or computed directly via the following $\sigma_n^2 = \frac{1}{n} \sum_{i = 1}^{n} D^2(\bar{Q}_n^{\star}, g_n)(O_i)$ (ci_param <- confint(tmle_hal_shift_1)) ## lwr_ci est upr_ci ## 1.897055 2.047358 2.197661 ## Advanced Usage: User-Specified Regressions In some special cases it may be useful for the experienced user to compute the treatment mechanism and outcome regressions separately (i.e., outside of the tmle_txshift wrapper function), instead applying this user-facing wrapper only to invoke the targeting steps involved in computing the TML estimator for the treatment shift parameter. In such cases, the optional arguments gn_fit_ext and Qn_fit_ext may be utilized. We present a case of using these here: # compute treatment mechanism (propensity score) externally ## first, produce the down-shifted treatment data gn_downshift <- dnorm(A - delta, mean = tx_mult * W, sd = 1) ## next, initialize and produce the up-shifted treatment data gn_upshift <- dnorm(A + delta, mean = tx_mult * W, sd = 1) ## now, initialize and produce the up-up-shifted (2 * delta) treatment data gn_upupshift <- dnorm(A + 2 * delta, mean = tx_mult * W, sd = 1) ## then, initialize and produce the un-shifted treatment data gn_noshift <- dnorm(A, mean = tx_mult * W, sd = 1) ## finally, put it all together into an object like what's produced internally gn_out <- as.data.table(cbind(gn_downshift, gn_noshift, gn_upshift, gn_upupshift)) setnames(gn_out, c("downshift", "noshift", "upshift", "upupshift")) # compute outcome regression externally Qn_noshift <- (W + A - min(Y)) / diff(range(Y)) Qn_upshift <- (W + A + delta - min(Y)) / diff(range(Y)) Qn_noshift[Qn_noshift < 0] <- .Machine$double.neg.eps
Qn_noshift[Qn_noshift > 1] <- 1 - .Machine$double.neg.eps Qn_upshift[Qn_upshift < 0] <- .Machine$double.neg.eps
Qn_upshift[Qn_upshift > 1] <- 1 - .Machine\$double.neg.eps
Qn_out <- as.data.table(cbind(Qn_noshift, Qn_upshift))
setnames(Qn_out, c("noshift", "upshift"))

# invoke the wrapper function only to apply the targeting step
tmle_shift_spec <- txshift(W = W, A = A, Y = Y, delta = delta,
fluctuation = "standard",
ipcw_fit_args = NULL,
g_fit_args = list(fit_type = "external"),
Q_fit_args = list(fit_type = "external"),
gn_fit_ext = gn_out,
Qn_fit_ext = Qn_out)
summary(tmle_shift_spec)
##     lwr_ci  param_est     upr_ci  param_var   eif_mean  estimator     n_iter
##      1.933     2.0689     2.2047     0.0048 4.2908e-11       tmle          0

## References

Coyle, Jeremy R, Nima S Hejazi, Ivana Malenica, and Oleg Sofrygin. 2020. sl3: Modern Pipelines for Machine Learning and Super Learning. https://github.com/tlverse/sl3. https://doi.org/10.5281/zenodo.1342293.

Dı́az, Iván, and Mark J van der Laan. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68 (2): 541–49.

———. 2018. “Stochastic Treatment Regimes.” In Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies, 167–80. Springer Science & Business Media.

Pearl, Judea. 2000. Causality. Cambridge university press.

van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

van der Laan, Mark J, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media.

———. 2018. Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer Science & Business Media.