Causal Inference with Inteference

RCT2

This package provides various statistical methods for designing and analyzing two-stage randomized controlled trials. Two-stage randomized controlled trials can be used to estimate spillover effects as well as direct treatment effects.

Motivation

The methods in this package address situations were some control units decide to take the treatment while others in the treatment group refuse to receive one. Often, researchers cannot force experimental subjects to adhere to protocol and the methods in this package allow analysis of two-stage randomized experiments with both interference and noncompliance.

Study Design

RSBY provides access to an insurance plan that covers all pre-existing diseases and there is no age limit of the beneficiaries. The data was collected through a randomized trial to determine whether RSBY increases access to hospitalization (and health) and reduces impoverishment due to high medical expenses. The Indian governemtn announced a new scheme to build on RSBY and provide coverage for almost 500 million Indians, but has not yet decided its design or how much to fund it. Spillover effects are of concern because formal insurance may crow our informal insurance; the enrollment in RSBY by one household may depend on the treatment assignment of other households. Additionally, we must address noncompliance because some households in the treatment group decided not to enroll in RSBY while some in the control group were able to join the insurance program.

The evaluation study is based on a total of 11089 above poverty line households in two districts of Karnataka State with no pre-existing health insurance coverage living within 25 km of a RSY empaneled hospital. A two-stage randomzied design was employed to study both direct and spillover over effects of RSBY. In the first stage, 219 randomly selected villages were assigned to the “High” treatment assignment mechanism and the rest were assigned to the “Low” treatment assignment mechanism. In the second stage, 80% of the households in the “High” assignment mechanism within a cluster were completely randomly assigned to the treatment condition, while the rest of the households were assigned to the control group. In contrast, under the “Low” assignment mechanism, 40% of the households within a cluster were completely randomly assigned to the treatment condition.

The households in the treatment group were given RSBY for free, whereas some households in the control group could buy RSBY for around INR 200. Upon being informed of the assignment treatment conditions, households were given the opportunities to enroll in RSBY from April to May, 2015. 18 months later the post-treatment survey was carried out, in which a variety of outcomes were measured.

Village-level arms Household-level arms
Mechanisms Number of villages Treatment Control Number of households Enrollment rates
High 219 80% 20% 5,714 67.0%
Low 216 40% 60% 5,373 46.2%

Data

The data set is a subset of data from the randomized evaluation of the India’s National Health Insurance Program (RSBY). The data initially contained six variables as listed below and after processing the for the purposes of the package, there remain four variables of interest which we remained for the purposes of analysis:

Z: treatment status

A: treatment assignment mechanism

D: enrolled in RSBY

Y: hospital expenditure (the outcome variable).

Overview

There are three functions in this package:

1. CADErand: computes the point estimates and variance estimates of the complier average direct effect (CADE) and the complier average spillover effect (CASE). The estimators calculated using this function are either individual weighted or cluster-weighted. The point estimates and variances of ITT effects are also included.

2. CADEreg: computes the point estimates of the complier average direct effect (CADE) and four different variance estimates: the HC2 variance, the cluster-robust variance, the cluster-robust HC2 variance and the variance proposed in the reference.

3. CADEparamreg: computes the point estimates of the complier average direct effect (CADE) and the complier average spillover effect (CASE) following the model-based approach presented in the appendix.

Functions

Before we begin, lets load the library and our example data set into R.

library(RCT2)
data(india)
india$id <- factor(india$id)

To run the CADErand command, simply type in the following:

rand <- CADErand(india, 0.95)
print(rand)
##    names reps2     estimate        variance        stds       lcis       rcis
## 1   CADE     0   1984.42477   1474406.32319  1214.25134   1908.283   2060.567
## 2   CADE     1  -1648.53065   1128010.63709  1062.07845   -1715.13  -1581.931
## 3   CASE     0   6568.04971 335327387.08457 18311.94657   5419.767   7716.333
## 4   CASE     1 -15900.39663 237301575.94459 15404.59594 -16866.369 -14934.424
## 5    DEY     0    875.43729    280649.14689   529.76329    842.218    908.657
## 6    DEY     1   -795.24119    263884.36586   513.69676   -827.453   -763.029
## 7    DED     0      0.44115         0.00044     0.02099     242.86    350.527
## 8    DED     1      0.48239         0.00052     0.02277  -1425.617  -1322.353
## 9    SEY     0    296.69351    737020.66452   858.49908       0.44      0.442
## 10   SEY     1  -1373.98496    677958.83661   823.38256      0.481      0.484
## 11   SED     0      0.04517         0.00077     0.02778      0.043      0.047
## 12   SED     1      0.08641         0.00281     0.05298      0.083       0.09

Note that you can specify the confidence interval level of your choosing with the parameter ci in the CADErand function. You can access any specific value with the $ operator. For example: rand$CADE
##      A_cluster0 A_cluster1
## [1,]   1984.425  -1648.531

allows you to access just the CADE estimates.

In order to analyze our data using a regression based method, we use the CADEreg function.

reg <- CADEreg(india, ci.level = 0.90)
print(reg)
## [[1]]
##    name          estimate           left CI         right CI
## 1 CADE1 -485.205567558982 -2604427.23867043 2603456.82753532
## 2 CADE0  3751.62334625516 -4491562.03320184 4490591.62206672
##
## [[2]]
## cluster_robust_variance        1844654    2692774
## HC2_variance                   1759371    3036458
## cluster_robust_HC2_variance    1853609    2705597
## individual_variance            1307332    2754695
## proposed_variance              1583084    2730381

This gives us the point estimates of CADE1 and CADE0 and their confidence intervals, and various types of variances for the CADE1 and CADE0. We can again access these by using the dollar sign notation. Note that we can use the parameter to specify the confidence interval level (i.e. 95%, 90%).

reg\$CADE1
##         M
## -485.2056

CADEparamreg offers a regression-based method for the computing the ITT effects and the average direct effects and spillover effects.

paramreg <- CADEparamreg(india, assign.prob = 0.8, ci.level = 0.95)
print(paramreg)
## [[1]]
##   Method Treatment Control  Treatment CI  Control CI
## 1 ITT DE     -1253    1447    -2646, 139   -94, 2988
## 2  IV DE     -6013    8724  -4631, -1131 -1991, 1630
## 3 ITT SE     -2881    -180   -11872, 139   407, 2988
## 4  IV SE    -11715    3022 -19445, -1131 -4927, 1630
##
## [[2]]
##              ITT pvalues  ITT tstat IV pvalues   IV tstat
## (Intercept) 6.731101e-31 11.5966578 0.11434527  1.5790968
## Z           2.917099e-02  2.1814799 0.02241845  2.2835546
## A           1.148445e+00 -0.1871396 0.47074057  0.7213018
## Z:A         1.972694e+00 -2.2074303 1.97260512 -2.2061660
print(paramreg)[[1]]
##   Method Treatment Control  Treatment CI  Control CI
## 1 ITT DE     -1253    1447    -2646, 139   -94, 2988
## 2  IV DE     -6013    8724  -4631, -1131 -1991, 1630
## 3 ITT SE     -2881    -180   -11872, 139   407, 2988
## 4  IV SE    -11715    3022 -19445, -1131 -4927, 1630

Note how we use to specify the assignment probability to the different assignment mechanisms. We also use again to specify the confidence intervals.