DisImpact Tutorial

Vinh Nguyen

2019-10-23

The DisImpact R package contains functions that help in determining disproportionate impact (DI) based on the following methodologies:

  1. percentage point gap (PPG) method,
  2. proportionality index method (method #1 in reference), and
  3. 80% index method (method #2 in reference).

Install Package

# From CRAN (Official)
## install.packages('DisImpact')

# From github (Development)
## devtools::install_github('vinhdizzo/DisImpact')

Load Packages

library(DisImpact)
library(dplyr) # Ease in manipulations with data frames

Load toy student equity data

To illustrate the functionality of the package, let’s load a toy data set:

# Load fake data set
data(student_equity)

The toy data set can be summarized as follows:

# Summarize toy data
dim(student_equity)
## [1] 20000     4
dSumm <- student_equity %>%
  group_by(Cohort, Ethnicity) %>%
  summarize(n=n(), Transfer_Rate=mean(Transfer))
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
dSumm
## # A tibble: 12 x 4
## # Groups:   Cohort [?]
##    Cohort Ethnicity           n Transfer_Rate
##     <int> <chr>           <int>         <dbl>
##  1   2017 Asian            3000         0.687
##  2   2017 Black            1000         0.31 
##  3   2017 Hispanic         2000         0.205
##  4   2017 Multi-Ethnicity   500         0.524
##  5   2017 Native American   100         0.43 
##  6   2017 White            3400         0.604
##  7   2018 Asian            3000         0.743
##  8   2018 Black            1000         0.297
##  9   2018 Hispanic         2000         0.218
## 10   2018 Multi-Ethnicity   500         0.484
## 11   2018 Native American   100         0.35 
## 12   2018 White            3400         0.631

Percentage point gap (PPG) method

di_ppg is the main work function, and it can take on vectors or column names the tidy way:

# Vector
di_ppg(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
## Warning: `list_len()` is deprecated as of rlang 0.2.0.
## Please use `new_list()` instead.
## This warning is displayed once per session.
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    4292 0.7153333    0.5264 0.03000000 0.6853333
## 2           Black 2000     607 0.3035000    0.5264 0.03000000 0.2735000
## 3        Hispanic 4000     847 0.2117500    0.5264 0.03000000 0.1817500
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264 0.03099032 0.4730097
## 5 Native American  200      78 0.3900000    0.5264 0.06929646 0.3207035
## 6           White 6800    4200 0.6176471    0.5264 0.03000000 0.5876471
##      pct_hi di_indicator
## 1 0.7453333            0
## 2 0.3335000            1
## 3 0.2417500            1
## 4 0.5349903            0
## 5 0.4592965            1
## 6 0.6476471            0
# Tidy and column reference
di_ppg(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    4292 0.7153333    0.5264 0.03000000 0.6853333
## 2           Black 2000     607 0.3035000    0.5264 0.03000000 0.2735000
## 3        Hispanic 4000     847 0.2117500    0.5264 0.03000000 0.1817500
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264 0.03099032 0.4730097
## 5 Native American  200      78 0.3900000    0.5264 0.06929646 0.3207035
## 6           White 6800    4200 0.6176471    0.5264 0.03000000 0.5876471
##      pct_hi di_indicator
## 1 0.7453333            0
## 2 0.3335000            1
## 3 0.2417500            1
## 4 0.5349903            0
## 5 0.4592965            1
## 6 0.6476471            0

Sometimes, one might want to break out the DI calculation by cohort:

# Cohort
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference        moe
## 1    2017           Asian 3000    2062 0.6873333    0.5140 0.03000000
## 2    2017           Black 1000     310 0.3100000    0.5140 0.03099032
## 3    2017        Hispanic 2000     410 0.2050000    0.5140 0.03000000
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140 0.04382693
## 5    2017 Native American  100      43 0.4300000    0.5140 0.09800000
## 6    2017           White 3400    2053 0.6038235    0.5140 0.03000000
## 7    2018           Asian 3000    2230 0.7433333    0.5388 0.03000000
## 8    2018           Black 1000     297 0.2970000    0.5388 0.03099032
## 9    2018        Hispanic 2000     437 0.2185000    0.5388 0.03000000
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388 0.04382693
## 11   2018 Native American  100      35 0.3500000    0.5388 0.09800000
## 12   2018           White 3400    2147 0.6314706    0.5388 0.03000000
##       pct_lo    pct_hi di_indicator
## 1  0.6573333 0.7173333            0
## 2  0.2790097 0.3409903            1
## 3  0.1750000 0.2350000            1
## 4  0.4801731 0.5678269            0
## 5  0.3320000 0.5280000            0
## 6  0.5738235 0.6338235            0
## 7  0.7133333 0.7733333            0
## 8  0.2660097 0.3279903            1
## 9  0.1885000 0.2485000            1
## 10 0.4401731 0.5278269            1
## 11 0.2520000 0.4480000            1
## 12 0.6014706 0.6614706            0

di_ppg is also applicable to summarized data; just pass the counts to success and group size to weight:

di_ppg(success=Transfer_Rate*n, group=Ethnicity, cohort=Cohort, weight=n, data=dSumm) %>%
  as.data.frame
##    cohort           group    n success       pct reference        moe
## 1    2017           Asian 3000    2062 0.6873333    0.5140 0.03000000
## 2    2017           Black 1000     310 0.3100000    0.5140 0.03099032
## 3    2017        Hispanic 2000     410 0.2050000    0.5140 0.03000000
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140 0.04382693
## 5    2017 Native American  100      43 0.4300000    0.5140 0.09800000
## 6    2017           White 3400    2053 0.6038235    0.5140 0.03000000
## 7    2018           Asian 3000    2230 0.7433333    0.5388 0.03000000
## 8    2018           Black 1000     297 0.2970000    0.5388 0.03099032
## 9    2018        Hispanic 2000     437 0.2185000    0.5388 0.03000000
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388 0.04382693
## 11   2018 Native American  100      35 0.3500000    0.5388 0.09800000
## 12   2018           White 3400    2147 0.6314706    0.5388 0.03000000
##       pct_lo    pct_hi di_indicator
## 1  0.6573333 0.7173333            0
## 2  0.2790097 0.3409903            1
## 3  0.1750000 0.2350000            1
## 4  0.4801731 0.5678269            0
## 5  0.3320000 0.5280000            0
## 6  0.5738235 0.6338235            0
## 7  0.7133333 0.7733333            0
## 8  0.2660097 0.3279903            1
## 9  0.1885000 0.2485000            1
## 10 0.4401731 0.5278269            1
## 11 0.2520000 0.4480000            1
## 12 0.6014706 0.6614706            0

The user could also pass in custom reference points for comparison (eg, a state-wide rate). di_ppg accepts either a single reference point to be used or a vector of reference points, one for each cohort. For the latter, the vector of reference points will be taken to correspond to the cohort variable, alphabetically ordered.

# With custom reference (single)
di_ppg(success=Transfer, group=Ethnicity, reference=0.54, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    4292 0.7153333      0.54 0.03000000 0.6853333
## 2           Black 2000     607 0.3035000      0.54 0.03000000 0.2735000
## 3        Hispanic 4000     847 0.2117500      0.54 0.03000000 0.1817500
## 4 Multi-Ethnicity 1000     504 0.5040000      0.54 0.03099032 0.4730097
## 5 Native American  200      78 0.3900000      0.54 0.06929646 0.3207035
## 6           White 6800    4200 0.6176471      0.54 0.03000000 0.5876471
##      pct_hi di_indicator
## 1 0.7453333            0
## 2 0.3335000            1
## 3 0.2417500            1
## 4 0.5349903            1
## 5 0.4592965            1
## 6 0.6476471            0
# With custom reference (multiple)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference=c(0.5, 0.55), data=student_equity) %>%
  as.data.frame
## Joining, by = "cohort"
## Warning: `chr_along()` is deprecated as of rlang 0.2.0.
## This warning is displayed once per session.
##    cohort           group    n success       pct reference        moe
## 1    2017           Asian 3000    2062 0.6873333      0.50 0.03000000
## 2    2017           Black 1000     310 0.3100000      0.50 0.03099032
## 3    2017        Hispanic 2000     410 0.2050000      0.50 0.03000000
## 4    2017 Multi-Ethnicity  500     262 0.5240000      0.50 0.04382693
## 5    2017 Native American  100      43 0.4300000      0.50 0.09800000
## 6    2017           White 3400    2053 0.6038235      0.50 0.03000000
## 7    2018           Asian 3000    2230 0.7433333      0.55 0.03000000
## 8    2018           Black 1000     297 0.2970000      0.55 0.03099032
## 9    2018        Hispanic 2000     437 0.2185000      0.55 0.03000000
## 10   2018 Multi-Ethnicity  500     242 0.4840000      0.55 0.04382693
## 11   2018 Native American  100      35 0.3500000      0.55 0.09800000
## 12   2018           White 3400    2147 0.6314706      0.55 0.03000000
##       pct_lo    pct_hi di_indicator
## 1  0.6573333 0.7173333            0
## 2  0.2790097 0.3409903            1
## 3  0.1750000 0.2350000            1
## 4  0.4801731 0.5678269            0
## 5  0.3320000 0.5280000            0
## 6  0.5738235 0.6338235            0
## 7  0.7133333 0.7733333            0
## 8  0.2660097 0.3279903            1
## 9  0.1885000 0.2485000            1
## 10 0.4401731 0.5278269            1
## 11 0.2520000 0.4480000            1
## 12 0.6014706 0.6614706            0

The margin of error (MOE) in di_ppg has 2 underlying assumptions (defaults):

  1. the minimum MOE returned is 0.03, and
  2. using 0.50 as the proportion in the margin of error formula, \(1.96 \times \sqrt{\hat{p} (1-\hat{p}) / n}\).

To override 1, the user could specify min_moe in di_ppg. To override 2, the user could specify use_prop_in_moe=TRUE in di_ppg.

# min_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02) %>%
  as.data.frame
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    4292 0.7153333    0.5264 0.02000000 0.6953333
## 2           Black 2000     607 0.3035000    0.5264 0.02191347 0.2815865
## 3        Hispanic 4000     847 0.2117500    0.5264 0.02000000 0.1917500
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264 0.03099032 0.4730097
## 5 Native American  200      78 0.3900000    0.5264 0.06929646 0.3207035
## 6           White 6800    4200 0.6176471    0.5264 0.02000000 0.5976471
##      pct_hi di_indicator
## 1 0.7353333            0
## 2 0.3254135            1
## 3 0.2317500            1
## 4 0.5349903            0
## 5 0.4592965            1
## 6 0.6376471            0
# use_prop_in_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02, use_prop_in_moe=TRUE) %>%
  as.data.frame
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    4292 0.7153333    0.5264 0.02000000 0.6953333
## 2           Black 2000     607 0.3035000    0.5264 0.02015028 0.2833497
## 3        Hispanic 4000     847 0.2117500    0.5264 0.02000000 0.1917500
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264 0.03098933 0.4730107
## 5 Native American  200      78 0.3900000    0.5264 0.06759869 0.3224013
## 6           White 6800    4200 0.6176471    0.5264 0.02000000 0.5976471
##      pct_hi di_indicator
## 1 0.7353333            0
## 2 0.3236503            1
## 3 0.2317500            1
## 4 0.5349893            0
## 5 0.4575987            1
## 6 0.6376471            0

In cases where the proportion is used in calculating MOE, a proportion of 0 or 1 would lead to a zero MOE. To account for these scenarios, the user could leverage the prop_sub_0 and prop_sub_1 parameters in di_ppg and ppg_moe as substitutes. These parameters default to 0.5.

# Set Native American to have have zero transfers and see what the results
di_ppg(success=Transfer, group=Ethnicity, data=student_equity %>% mutate(Transfer=ifelse(Ethnicity=='Native American', 0, Transfer)), use_prop_in_moe=TRUE, prop_sub_0=0.1, prop_sub_1=0.9) %>%
  as.data.frame
## Warning in ppg_moe(n = n, proportion = pct, min_moe = min_moe, prop_sub_0 =
## prop_sub_0, : The vector `proportion` contains 0. This will lead to a zero
## MOE. `prop_sub_0=0.1` will be used in calculating the MOE for these cases.
##             group    n success       pct reference        moe      pct_lo
## 1           Asian 6000    4292 0.7153333    0.5225 0.03000000  0.68533333
## 2           Black 2000     607 0.3035000    0.5225 0.03000000  0.27350000
## 3        Hispanic 4000     847 0.2117500    0.5225 0.03000000  0.18175000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5225 0.03098933  0.47301067
## 5 Native American  200       0 0.0000000    0.5225 0.04157788 -0.04157788
## 6           White 6800    4200 0.6176471    0.5225 0.03000000  0.58764706
##       pct_hi di_indicator
## 1 0.74533333            0
## 2 0.33350000            1
## 3 0.24175000            1
## 4 0.53498933            0
## 5 0.04157788            1
## 6 0.64764706            0

Proportionality index

di_prop_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success pct_success pct_group di_prop_index
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159
## 2           Black 2000     607 0.057655775      0.10     0.5765578
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468
## 5 Native American  200      78 0.007408815      0.01     0.7408815
## 6           White 6800    4200 0.398936170      0.34     1.1733417
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success pct_success pct_group di_prop_index
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159
## 2           Black 2000     607 0.057655775      0.10     0.5765578
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468
## 5 Native American  200      78 0.007408815      0.01     0.7408815
## 6           White 6800    4200 0.398936170      0.34     1.1733417
# With cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944

80% index

di_80_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success       pct reference di_80_index
## 1           Asian 6000    4292 0.7153333 0.7153333   1.0000000
## 2           Black 2000     607 0.3035000 0.7153333   0.4242777
## 3        Hispanic 4000     847 0.2117500 0.7153333   0.2960158
## 4 Multi-Ethnicity 1000     504 0.5040000 0.7153333   0.7045666
## 5 Native American  200      78 0.3900000 0.7153333   0.5452004
## 6           White 6800    4200 0.6176471 0.7153333   0.8634395
##   di_indicator
## 1            0
## 2            1
## 3            1
## 4            1
## 5            1
## 6            0
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference di_80_index
## 1           Asian 6000    4292 0.7153333 0.7153333   1.0000000
## 2           Black 2000     607 0.3035000 0.7153333   0.4242777
## 3        Hispanic 4000     847 0.2117500 0.7153333   0.2960158
## 4 Multi-Ethnicity 1000     504 0.5040000 0.7153333   0.7045666
## 5 Native American  200      78 0.3900000 0.7153333   0.5452004
## 6           White 6800    4200 0.6176471 0.7153333   0.8634395
##   di_indicator
## 1            0
## 2            1
## 3            1
## 4            1
## 5            1
## 6            0
# With cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success       pct reference di_80_index
## 1    2017           Asian 3000    2062 0.6873333 0.6873333   1.0000000
## 2    2017           Black 1000     310 0.3100000 0.6873333   0.4510184
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333   0.2982541
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333   0.7623666
## 5    2017 Native American  100      43 0.4300000 0.6873333   0.6256062
## 6    2017           White 3400    2053 0.6038235 0.6873333   0.8785017
## 7    2018           Asian 3000    2230 0.7433333 0.7433333   1.0000000
## 8    2018           Black 1000     297 0.2970000 0.7433333   0.3995516
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333   0.2939462
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333   0.6511211
## 11   2018 Native American  100      35 0.3500000 0.7433333   0.4708520
## 12   2018           White 3400    2147 0.6314706 0.7433333   0.8495120
##    di_indicator
## 1             0
## 2             1
## 3             1
## 4             1
## 5             1
## 6             0
## 7             0
## 8             1
## 9             1
## 10            1
## 11            1
## 12            0
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference di_80_index
## 1    2017           Asian 3000    2062 0.6873333 0.6873333   1.0000000
## 2    2017           Black 1000     310 0.3100000 0.6873333   0.4510184
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333   0.2982541
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333   0.7623666
## 5    2017 Native American  100      43 0.4300000 0.6873333   0.6256062
## 6    2017           White 3400    2053 0.6038235 0.6873333   0.8785017
## 7    2018           Asian 3000    2230 0.7433333 0.7433333   1.0000000
## 8    2018           Black 1000     297 0.2970000 0.7433333   0.3995516
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333   0.2939462
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333   0.6511211
## 11   2018 Native American  100      35 0.3500000 0.7433333   0.4708520
## 12   2018           White 3400    2147 0.6314706 0.7433333   0.8495120
##    di_indicator
## 1             0
## 2             1
## 3             1
## 4             1
## 5             1
## 6             0
## 7             0
## 8             1
## 9             1
## 10            1
## 11            1
## 12            0

When we have a non-success variable like drop-out

Suppose we have a variable that indicates something negative. We could calculate DI on the converse of it:

di_ppg(success=!Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    1708 0.2846667    0.4736 0.03000000 0.2546667
## 2           Black 2000    1393 0.6965000    0.4736 0.03000000 0.6665000
## 3        Hispanic 4000    3153 0.7882500    0.4736 0.03000000 0.7582500
## 4 Multi-Ethnicity 1000     496 0.4960000    0.4736 0.03099032 0.4650097
## 5 Native American  200     122 0.6100000    0.4736 0.06929646 0.5407035
## 6           White 6800    2600 0.3823529    0.4736 0.03000000 0.3523529
##      pct_hi di_indicator
## 1 0.3146667            1
## 2 0.7265000            0
## 3 0.8182500            0
## 4 0.5269903            0
## 5 0.6792965            0
## 6 0.4123529            1

Transformations on the fly

We can compute the success, group, and cohort variables on the fly:

# Transform success
a <- sample(0:1, size=nrow(student_equity), replace=TRUE, prob=c(0.95, 0.05))
mean(a)
## [1] 0.04815
di_ppg(success=pmax(Transfer, a), group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    4385 0.7308333    0.5488 0.03000000 0.7008333
## 2           Black 2000     676 0.3380000    0.5488 0.03000000 0.3080000
## 3        Hispanic 4000     982 0.2455000    0.5488 0.03000000 0.2155000
## 4 Multi-Ethnicity 1000     532 0.5320000    0.5488 0.03099032 0.5010097
## 5 Native American  200      84 0.4200000    0.5488 0.06929646 0.3507035
## 6           White 6800    4317 0.6348529    0.5488 0.03000000 0.6048529
##      pct_hi di_indicator
## 1 0.7608333            0
## 2 0.3680000            1
## 3 0.2755000            1
## 4 0.5629903            0
## 5 0.4892965            1
## 6 0.6648529            0
# Collapse Black and Hispanic
di_ppg(success=Transfer, group=ifelse(Ethnicity %in% c('Black', 'Hispanic'), 'Black/Hispanic', Ethnicity), data=student_equity) %>% as.data.frame
##             group    n success       pct reference        moe    pct_lo
## 1           Asian 6000    4292 0.7153333    0.5264 0.03000000 0.6853333
## 2  Black/Hispanic 6000    1454 0.2423333    0.5264 0.03000000 0.2123333
## 3 Multi-Ethnicity 1000     504 0.5040000    0.5264 0.03099032 0.4730097
## 4 Native American  200      78 0.3900000    0.5264 0.06929646 0.3207035
## 5           White 6800    4200 0.6176471    0.5264 0.03000000 0.5876471
##      pct_hi di_indicator
## 1 0.7453333            0
## 2 0.2723333            1
## 3 0.5349903            0
## 4 0.4592965            1
## 5 0.6476471            0

Calculate DI for many variables and groups

It is often the case that the user desires to calculate disproportionate impact across many outcome variables and many disaggregation/group variables. The function di_ppg_iterate allows the user to specify a data set and the various variables to iterate across:

# Multiple group variables
di_ppg_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), reference_groups='overall')
## Joining, by = "group_var"
## # A tibble: 16 x 13
##    success_variable disaggregation reference_group cohort group          n
##    <chr>            <chr>          <chr>            <int> <chr>      <dbl>
##  1 Transfer         Ethnicity      overall           2017 Asian       3000
##  2 Transfer         Ethnicity      overall           2017 Black       1000
##  3 Transfer         Ethnicity      overall           2017 Hispanic    2000
##  4 Transfer         Ethnicity      overall           2017 Multi-Eth~   500
##  5 Transfer         Ethnicity      overall           2017 Native Am~   100
##  6 Transfer         Ethnicity      overall           2017 White       3400
##  7 Transfer         Ethnicity      overall           2018 Asian       3000
##  8 Transfer         Ethnicity      overall           2018 Black       1000
##  9 Transfer         Ethnicity      overall           2018 Hispanic    2000
## 10 Transfer         Ethnicity      overall           2018 Multi-Eth~   500
## 11 Transfer         Ethnicity      overall           2018 Native Am~   100
## 12 Transfer         Ethnicity      overall           2018 White       3400
## 13 Transfer         Gender         overall           2017 Female      5080
## 14 Transfer         Gender         overall           2017 Male        4920
## 15 Transfer         Gender         overall           2018 Female      4965
## 16 Transfer         Gender         overall           2018 Male        5035
## # ... with 7 more variables: success <int>, pct <dbl>, reference <dbl>,
## #   moe <dbl>, pct_lo <dbl>, pct_hi <dbl>, di_indicator <dbl>
# Multiple group variables and different reference groups
bind_rows(
  di_ppg_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), reference_groups=c('White', 'Male'))
  , di_ppg_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), reference_groups='overall')
)
## Joining, by = "group_var"
## Joining, by = "cohort"
## Joining, by = "cohort"
## Joining, by = "group_var"
## # A tibble: 32 x 13
##    success_variable disaggregation reference_group cohort group          n
##    <chr>            <chr>          <chr>            <int> <chr>      <dbl>
##  1 Transfer         Ethnicity      White             2017 Asian       3000
##  2 Transfer         Ethnicity      White             2017 Black       1000
##  3 Transfer         Ethnicity      White             2017 Hispanic    2000
##  4 Transfer         Ethnicity      White             2017 Multi-Eth~   500
##  5 Transfer         Ethnicity      White             2017 Native Am~   100
##  6 Transfer         Ethnicity      White             2017 White       3400
##  7 Transfer         Ethnicity      White             2018 Asian       3000
##  8 Transfer         Ethnicity      White             2018 Black       1000
##  9 Transfer         Ethnicity      White             2018 Hispanic    2000
## 10 Transfer         Ethnicity      White             2018 Multi-Eth~   500
## # ... with 22 more rows, and 7 more variables: success <int>, pct <dbl>,
## #   reference <dbl>, moe <dbl>, pct_lo <dbl>, pct_hi <dbl>,
## #   di_indicator <dbl>