Statistics for Bayesian Models

Daniel Lüdecke

2019-03-15

This vignettes demontrates those functions of the sjstats-package that deal especially with Bayesian models. sjstats provides following functions:

Befor we start, we fit some models, including a mediation-object from the mediation-package, which we use for comparison with brms. The functions work with brmsfit, stanreg and stanfit-objects.

library(sjstats)
library(sjmisc)
library(mediation)
library(brms)

# load sample data
data(jobs)
data(efc)
data(fish)
efc <- to_factor(efc, e42dep, c172code, c161sex, e15relat)
set.seed(123)

# linear models, for mediation analysis
b1 <- lm(job_seek ~ treat + econ_hard + sex + age, data = jobs)
b2 <- lm(depress2 ~ treat + job_seek + econ_hard + sex + age, data = jobs)

# mediation analysis, for comparison with brms
m1 <- mediate(b1, b2, sims = 1000, treat = "treat", mediator = "job_seek")

# fit Bayesian models
f1 <- bf(job_seek ~ treat + econ_hard + sex + age)
f2 <- bf(depress2 ~ treat + job_seek + econ_hard + sex + age)

m2 <- brm(f1 + f2 + set_rescor(FALSE), data = jobs, cores = 4)

m3 <- brm(
  bf(count ~ child + camper + (1 | persons), 
     zi ~ child + camper + (1 | persons)),
  data = fish,
  family = zero_inflated_poisson(),
  cores = 4
)

m4 <- brm(
  mpg ~ wt + hp + (1 | cyl) + (1 + wt | gear), 
  data = mtcars, 
  cores = 4
)

m5 <- brm(
  neg_c_7 ~ e42dep + c12hour + c172code + (1 | e15relat),
  data = efc,
  cores = 4
)

Highest Density Interval

hdi() computes the highest density interval for posterior samples. Unlike equal-tailed intervals that exclude 2.5% from each tail of the distribution, the HDI is not equal-tailed and therefor always includes the mode(s) of posterior distributions.

By default, hdi() prints the 90% intervals, however, the prob-argument can be used to calculate different or even multiple intervals.

hdi(m2)
#> 
#> # Highest Density Interval
#> 
#>                            HDI(90%)
#>  b_jobseek_Intercept  [ 3.47  3.88]
#>  b_depress2_Intercept [ 1.97  2.46]
#>  b_jobseek_treat      [-0.02  0.15]
#>  b_jobseek_econ_hard  [ 0.01  0.09]
#>  b_jobseek_sex        [-0.09  0.07]
#>  b_jobseek_age        [ 0.00  0.01]
#>  b_depress2_treat     [-0.11  0.03]
#>  b_depress2_job_seek  [-0.29 -0.19]
#>  b_depress2_econ_hard [ 0.12  0.18]
#>  b_depress2_sex       [ 0.04  0.17]
#>  b_depress2_age       [-0.00  0.00]
#>  sigma_jobseek        [ 0.70  0.76]
#>  sigma_depress2       [ 0.59  0.64]

hdi(m2, prob = c(.5, .89))
#> 
#> # Highest Density Interval
#> 
#>                            HDI(50%)      HDI(89%)
#>  b_jobseek_Intercept  [ 3.59  3.76] [ 3.47  3.86]
#>  b_depress2_Intercept [ 2.13  2.32] [ 1.97  2.44]
#>  b_jobseek_treat      [ 0.03  0.10] [-0.02  0.15]
#>  b_jobseek_econ_hard  [ 0.04  0.07] [ 0.02  0.09]
#>  b_jobseek_sex        [-0.04  0.03] [-0.09  0.07]
#>  b_jobseek_age        [ 0.00  0.01] [ 0.00  0.01]
#>  b_depress2_treat     [-0.08 -0.02] [-0.11  0.03]
#>  b_depress2_job_seek  [-0.26 -0.22] [-0.29 -0.19]
#>  b_depress2_econ_hard [ 0.13  0.16] [ 0.11  0.18]
#>  b_depress2_sex       [ 0.08  0.13] [ 0.04  0.17]
#>  b_depress2_age       [-0.00  0.00] [-0.00  0.00]
#>  sigma_jobseek        [ 0.71  0.74] [ 0.70  0.75]
#>  sigma_depress2       [ 0.60  0.62] [ 0.59  0.64]

For multilevel models, the type-argument defines whether the HDI of fixed, random or all effects are shown.

hdi(m5, type = "random")
#> 
#> # Highest Density Interval
#> 
#>                              HDI(90%)
#>  r_e15relat.1.Intercept. [-0.12 1.31]
#>  r_e15relat.2.Intercept. [-0.16 1.00]
#>  r_e15relat.3.Intercept. [-0.84 0.77]
#>  r_e15relat.4.Intercept. [-0.57 0.78]
#>  r_e15relat.5.Intercept. [-0.76 0.87]
#>  r_e15relat.6.Intercept. [-1.61 0.33]
#>  r_e15relat.7.Intercept. [-1.28 0.76]
#>  r_e15relat.8.Intercept. [-0.84 0.49]

The computation for the HDI is based on the code from Kruschke 2015, pp. 727f. For default sampling in Stan (4000 samples), the 90% intervals for HDI are more stable than, for instance, 95% intervals. An effective sample size of at least 10.000 is recommended if 95% intervals should be computed (see Kruschke 2015, p. 183ff).

Region of Practical Equivalence (ROPE)

Unlike a frequentist approach, Bayesian inference is not based on stastical significance, where effects need to be different from “zero”. Rather, the magnitude of a model’s parameter value and its uncertainty should not be ignored, and hence, an effect is not present when it simply differs from zero, but if it’s outside a specific range that can be considered as “practically no effect”. This range is called the region of practical equivalence (ROPE).

rope() requires the rope-argument, which defined this region, and then gives a summary about the parameters and their proportion that lies inside and outside this ROPE.

rope(m5, rope = c(-1, 1))
#> 
#> # Proportions of samples inside and outside the ROPE
#> 
#>              inside outside
#>  b_Intercept   0.0%  100.0%
#>  b_e42dep2    43.0%   57.0%
#>  b_e42dep3     0.6%   99.5%
#>  b_e42dep4     0.0%  100.0%
#>  b_c12hour   100.0%    0.0%
#>  b_c172code2  99.7%    0.3%
#>  b_c172code3  79.0%   21.0%
#>  sigma         0.0%  100.0%

rope() does not suggest limits for the region of practical equivalence and does not tell you how big is practically equivalent to the null value. However, there are suggestions how to choose reasonable limits (see Kruschke 2018), which are implemented in the equi_test() functions.

Test for Practical Equivalence

equi_test() combines the two functions hdi() and rope() and performs a “HDI+ROPE decision rule” (Test for Practical Equivalence) (Kruschke 2018) to check whether parameter values should be accepted or rejected against the background of a formulated null hypothesis.

equi_test() computes the 95%-HDI and checks if a model predictor’s HDI lies completely outside, completely inside or partially inside the ROPE. If the HDI is completely outside the ROPE, the “null hypothesis” for this parameter is “rejected”. If the ROPE completely covers the HDI, i.e. all most credible values of a parameter are inside the region of practical equivalence, the null hypothesis is accepted. Else, it’s undecided whether to accept or reject the null hypothesis. In short, desirable results are low proportions inside the ROPE (the closer to zero the better) and the H0 should be rejected.

If neither the rope nor eff_size argument are specified, the effect size will be set to 0.1 (half of a small effect according to Cohen) and the ROPE is then 0 +/- .1 * sd(y) for linear models. This is the suggested way to specify the ROPE limits according to Kruschke (2018).

equi_test(m5)
#> 
#> # Test for Practical Equivalence of Model Predictors
#> 
#>   Effect Size: 0.10
#>          ROPE: [-0.39 0.39]
#>       Samples: 4000
#> 
#>                         H0 %inROPE     HDI(95%)
#>  b_Intercept (*)    reject    0.00 [ 7.58 9.93]
#>  b_e42dep2 (*)   undecided    7.47 [-0.01 2.05]
#>  b_e42dep3 (*)      reject    0.00 [ 1.30 3.25]
#>  b_e42dep4 (*)      reject    0.00 [ 2.77 4.91]
#>  b_c12hour          accept  100.00 [ 0.00 0.01]
#>  b_c172code2 (*) undecided   75.93 [-0.44 0.77]
#>  b_c172code3 (*) undecided   20.50 [-0.07 1.44]
#>  sigma              reject    0.00 [ 3.42 3.76]
#> 
#> (*) the number of effective samples may be insufficient for some parameters

For models with binary outcome, there is no concrete way to derive the effect size that defines the ROPE limits. Two examples from Kruschke suggest that a negligible change is about .05 on the logit-scale. In these cases, it is recommended to specify the rope argument, however, if not specified, the ROPE limits are caluclated as suggested by Kruschke: The effect size is the probability of “success” for the outcome, divided by pi. For all other models, 0 +/- .1 * sd(intercept) is used to determine the ROPE limits.

Beside a numerical output, the results can also be printed as HTML-table or plotted, using the out-argument. For plots, the 95% distributions of the posterior samles are shown, the ROPE is a light-blue shaded region in the plot, and the distributions are colored depending on whether the parameter values are accepted, rejected or undecided.

equi_test(m5, out = "plot")

Tidy Summary of Bayesian Models

tidy_stan() is no substitute, but rather a convenient alternative to summary(). The major differences are: tidy_stan()

tidy_stan(m3)
#> 
#> # Summary Statistics of Stan-Model
#> 
#> ## Conditional Model:
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  Intercept     1.27      0.69 [-0.24  2.65]  0.24    1 0.03
#>  child        -1.15      0.09 [-1.29 -0.99]  0.65    1 0.00
#>  camper        0.73      0.10 [ 0.59  0.88]  1.00    1 0.00
#> 
#> ## Zero-Inflated Model:
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  Intercept    -0.68      0.67 [-1.85  0.50]  0.41    1 0.02
#>  child         1.90      0.33 [ 1.40  2.46]  0.80    1 0.01
#>  camper       -0.85      0.38 [-1.38 -0.25]  0.32    1 0.01

Additional statistics in the output are:

By default, the “estimate” is the median of the posterior distribution, but this can be changed with the typical-argument.

tidy_stan(m3, typical = "mean")
#> 
#> # Summary Statistics of Stan-Model
#> 
#> ## Conditional Model:
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  Intercept     1.28      0.69 [-0.24  2.65]  0.24    1 0.03
#>  child        -1.15      0.09 [-1.29 -0.99]  0.65    1 0.00
#>  camper        0.73      0.10 [ 0.59  0.88]  1.00    1 0.00
#> 
#> ## Zero-Inflated Model:
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  Intercept    -0.67      0.67 [-1.85  0.50]  0.41    1 0.02
#>  child         1.89      0.33 [ 1.40  2.46]  0.80    1 0.01
#>  camper       -0.85      0.38 [-1.38 -0.25]  0.32    1 0.01

To also show random effects of multilevel models, use the type-argument.

# printing fixed and random effects of multilevel model
tidy_stan(m3, type = "all")
#> 
#> # Summary Statistics of Stan-Model
#> 
#> ## Conditional Model: Fixed effects
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  Intercept     1.27      0.69 [-0.24  2.65]  0.24    1 0.03
#>  child        -1.15      0.09 [-1.29 -0.99]  0.65    1 0.00
#>  camper        0.73      0.10 [ 0.59  0.88]  1.00    1 0.00
#> 
#> ## Conditional Model: Random effect (Intercept: persons)
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  persons.1    -1.33      0.73 [-2.75  0.19]  0.24    1 0.03
#>  persons.2    -0.36      0.71 [-1.67  1.23]  0.24    1 0.03
#>  persons.3     0.35      0.71 [-1.11  1.79]  0.24    1 0.03
#>  persons.4     1.25      0.70 [-0.19  2.69]  0.23    1 0.03
#> 
#> ## Zero-Inflated Model: Fixed effects
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  Intercept    -0.68      0.67 [-1.85  0.50]  0.41    1 0.02
#>  child         1.90      0.33 [ 1.40  2.46]  0.80    1 0.01
#>  camper       -0.85      0.38 [-1.38 -0.25]  0.32    1 0.01
#> 
#> ## Zero-Inflated Model: Random effect (Intercept: persons)
#> 
#>            estimate std.error      HDI(89%) ratio rhat mcse
#>  persons.1     1.26      0.72 [ 0.01  2.61]  0.36    1 0.02
#>  persons.2     0.34      0.66 [-0.85  1.50]  0.36    1 0.02
#>  persons.3    -0.15      0.67 [-1.37  1.02]  0.34    1 0.02
#>  persons.4    -1.33      0.75 [-2.55 -0.01]  0.35    1 0.02

By default, 89%-HDI are computed (a convention following McElreath 2015), but other or even multiple HDI can be computed using the prob argument.

# two different HDI for multivariate response model
tidy_stan(m2, prob = c(.5, .95))
#> 
#> # Summary Statistics of Stan-Model
#> 
#> ## Response: jobseek
#> 
#>            estimate std.error      HDI(50%)      HDI(95%) ratio rhat mcse
#>  Intercept     3.67      0.13 [ 3.59  3.76] [ 3.41  3.91]  1.73    1    0
#>  treat         0.06      0.05 [ 0.03  0.10] [-0.04  0.16]  1.98    1    0
#>  econ_hard     0.05      0.02 [ 0.04  0.07] [ 0.00  0.10]  1.65    1    0
#>  sex          -0.01      0.05 [-0.04  0.03] [-0.10  0.09]  1.82    1    0
#>  age           0.00      0.00 [ 0.00  0.01] [-0.00  0.01]  1.84    1    0
#> 
#> ## Response: depress2
#> 
#>            estimate std.error      HDI(50%)      HDI(95%) ratio rhat mcse
#>  Intercept     2.21      0.15 [ 2.13  2.32] [ 1.92  2.50]  1.72    1    0
#>  treat        -0.04      0.04 [-0.08 -0.02] [-0.13  0.04]  1.66    1    0
#>  job_seek     -0.24      0.03 [-0.26 -0.22] [-0.29 -0.18]  1.71    1    0
#>  econ_hard     0.15      0.02 [ 0.13  0.16] [ 0.11  0.19]  1.97    1    0
#>  sex           0.11      0.04 [ 0.08  0.13] [ 0.02  0.18]  1.58    1    0
#>  age           0.00      0.00 [-0.00  0.00] [-0.00  0.00]  1.33    1    0

Summary of Mediation Analysis

mediation() is another summary function, especially for mediation analysis, i.e. for multivariate response models with casual mediation effects.

Let us recall the models:

f1 <- bf(job_seek ~ treat + econ_hard + sex + age)
f2 <- bf(depress2 ~ treat + job_seek + econ_hard + sex + age)

m2 <- brm(f1 + f2 + set_rescor(FALSE), data = jobs, cores = 4)

Here, treat is the treatment effect, job_seek is the mediator effect, f1 describes the mediator model and f2 describes the outcome model.

mediation() returns a data frame with information on the direct effect (median value of posterior samples from treatment of the outcome model), mediator effect (median value of posterior samples from mediator of the outcome model), indirect effect (median value of the multiplication of the posterior samples from mediator of the outcome model and the posterior samples from treatment of the mediation model) and the total effect (median value of sums of posterior samples used for the direct and indirect effect). The proportion mediated is the indirect effect divided by the total effect.

The simplest call just needs the model-object.

mediation(m2)
#> 
#> # Causal Mediation Analysis for Stan Model
#> 
#>   Treatment: treat
#>    Mediator: job_seek
#>    Response: depress2
#> 
#>                  Estimate    HDI (90%)
#>   Direct effect:    -0.04 [-0.11 0.03]
#> Indirect effect:    -0.02 [-0.04 0.01]
#>    Total effect:    -0.06 [-0.13 0.02]
#> 
#> Proportion mediated: 27.68% [-92.66% 148.02%]

Typically, mediation() finds the treatment and mediator variables automatically. If this does not work, use the treatment and mediator arguments to specify the related variable names. For all values, the 90% HDIs are calculated by default. Use prob to calculate a different interval.

Here is a comparison with the mediation package. Note that the summary()-output of the mediation package shows the indirect effect first, followed by the direct effect.

summary(m1)
#> 
#> Causal Mediation Analysis 
#> 
#> Quasi-Bayesian Confidence Intervals
#> 
#>                Estimate 95% CI Lower 95% CI Upper p-value
#> ACME            -0.0157      -0.0387         0.01    0.19
#> ADE             -0.0438      -0.1315         0.04    0.35
#> Total Effect    -0.0595      -0.1530         0.02    0.21
#> Prop. Mediated   0.2137      -2.0277         2.70    0.32
#> 
#> Sample Size Used: 899 
#> 
#> 
#> Simulations: 1000

mediation(m2, prob = .95)
#> 
#> # Causal Mediation Analysis for Stan Model
#> 
#>   Treatment: treat
#>    Mediator: job_seek
#>    Response: depress2
#> 
#>                  Estimate    HDI (95%)
#>   Direct effect:    -0.04 [-0.13 0.04]
#> Indirect effect:    -0.02 [-0.04 0.01]
#>    Total effect:    -0.06 [-0.15 0.03]
#> 
#> Proportion mediated: 27.68% [-225.15% 280.5%]

If you want to calculate mean instead of median values from the posterior samples, use the typical-argument. Furthermore, there is a print()-method, which allows to print more digits.

mediation(m2, typical = "mean", prob = .95) %>% print(digits = 4)
#> 
#> # Causal Mediation Analysis for Stan Model
#> 
#>   Treatment: treat
#>    Mediator: job_seek
#>    Response: depress2
#> 
#>                  Estimate        HDI (95%)
#>   Direct effect:  -0.0397 [-0.1284 0.0438]
#> Indirect effect:  -0.0155 [-0.0392 0.0101]
#>    Total effect:  -0.0552 [-0.1502 0.0271]
#> 
#> Proportion mediated: 28.0921% [-224.7357% 280.92%]

As you can see, the results are similar to what the mediation package produces for non-Bayesian models.

ICC for multilevel models

Similar to frequentist multilevel models, icc() computes the intraclass correlation coefficient for Bayesian multilevel models. One advantage of Bayesian regression models is that you can compute the ICC for each sample of the posterior distribution, which allows you to easily calculate uncertainty intervals.

icc(m4)
#> 
#> # Random Effect Variances and ICC
#> 
#> Family: gaussian (identity)
#> 
#> ## cyl
#>           ICC:  0.26  HDI 89%: [0.00  0.71]
#> Between-group: 24.16  HDI 89%: [0.00 43.56]
#> 
#> ## gear
#>           ICC:  0.51  HDI 89%: [0.00   0.92]
#> Between-group: 45.96  HDI 89%: [0.00 104.20]
#> 
#> ## Residuals
#> Within-group: 6.44  HDI 89%: [3.27 8.95]
#> 
#> ## Random-slope-variance
#> gear: 7.71  HDI 89%: [0.00 16.89]

icc(m5)
#> 
#> # Random Effect Variances and ICC
#> 
#> Family: gaussian (identity)
#> 
#> ## e15relat
#>           ICC: 0.04  HDI 89%: [0.00 0.09]
#> Between-group: 0.55  HDI 89%: [0.00 1.24]
#> 
#> ## Residuals
#> Within-group: 12.81  HDI 89%: [11.80 13.82]

For non-Gaussian models, there is no clean variance decomposition and hence the ICC can’t be calculated exactly. The general Bayesian way to analyse the random-effect variances is then to draw samples from the posterior predictive distribution, calculate the variances and compare how the variance across models changes when group-specific term are included or dropped.

You can achieve this with the ppd-argument. In this case, draws from the posterior predictive distribution not conditioned on group-level terms (using posterior_predict(..., re.form = NA)) as well as draws from this distribution conditioned on all random effects (by default, unless specified else in the re.form-argument) are taken. Then, the variances for each of these draws are calculated. The “ICC” is then the ratio between these two variances.

icc(m4, ppd = TRUE, re.form = ~(1 | cyl), prob = .5)
#> 
#> # Random Effect Variances and ICC
#> 
#> Family: gaussian (identity)
#> Conditioned on: ~(1 | cyl)
#> 
#> ## Variance Ratio (comparable to ICC)
#> Ratio: 0.09  HDI 50%: [-0.05 0.32]
#> 
#> ## Variances of Posterior Predicted Distribution
#> Conditioned on fixed effects: 34.73  HDI 50%: [18.43 36.55]
#> Conditioned on rand. effects: 39.09  HDI 50%: [28.04 44.56]
#> 
#> ## Difference in Variances
#> Difference: 4.36  HDI 50%: [-2.99 10.87]

Sometimes, when the variance of the posterior predictive distribution is very large, the variance ratio in the output makes no sense, e.g. because it is negative. In such cases, it might help to use a more robust measure to calculate the central tendency of the variances. This can be done with the typical-argument.

# the "classical" ICC, not recommended for non-Gaussian
icc(m3)
#> 
#> # Random Effect Variances and ICC
#> 
#> Family: zero_inflated_poisson (log)
#> 
#> ## persons
#>           ICC: 0.68  HDI 89%: [0.43  0.98]
#> Between-group: 4.88  HDI 89%: [0.23 10.34]
#> 
#> ## Residuals
#> Within-group: 1.00  HDI 89%: [1.00 1.00]

# variance ratio based on posterior predictive distributions,
# which is negative and hence obviously nonsense
icc(m3, ppd = TRUE)
#> 
#> # Random Effect Variances and ICC
#> 
#> Family: zero_inflated_poisson (log)
#> Conditioned on: all random effects
#> 
#> ## Variance Ratio (comparable to ICC)
#> Ratio: -8.38  HDI 89%: [-0.92 1.00]
#> 
#> ## Variances of Posterior Predicted Distribution
#> Conditioned on fixed effects: 357.23  HDI 89%: [ 0.03 75.20]
#> Conditioned on rand. effects:  39.53  HDI 89%: [31.12 48.15]
#> 
#> ## Difference in Variances
#> Difference: -317.70  HDI 89%: [-39.18 48.93]

# use median instead of mean
icc(m3, ppd = TRUE, typical = "median")
#> 
#> # Random Effect Variances and ICC
#> 
#> Family: zero_inflated_poisson (log)
#> Conditioned on: all random effects
#> 
#> ## Variance Ratio (comparable to ICC)
#> Ratio: 0.70  HDI 89%: [-0.88 1.00]
#> 
#> ## Variances of Posterior Predicted Distribution
#> Conditioned on fixed effects: 11.71  HDI 89%: [ 0.04 74.93]
#> Conditioned on rand. effects: 39.33  HDI 89%: [30.75 48.09]
#> 
#> ## Difference in Variances
#> Difference: 26.50  HDI 89%: [-35.34 51.70]

Bayes r-squared and LOO-adjusted r-squared

r2() computes either the Bayes r-squared value, or - if loo = TRUE - a LOO-adjusted r-squared value (which comes conceptionally closer to an adjusted r-squared measure).

For the Bayes r-squared, the standard error is also reported. Note that r2() uses the median as measure of central tendency and the median absolute deviation as measure for variability.

r2(m5)
#> 
#> R-Squared for (Generalized) Linear (Mixed) Model
#> 
#>       Bayes R2: 0.168
#> Standard Error: 0.022

r2(m5, loo = TRUE)
#> 
#> R-Squared for (Generalized) Linear (Mixed) Model
#> 
#> LOO-adjusted R2: 0.145

References

Kruschke JK. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. 2nd edition. Academic Press, 2015

Kruschke JK. Rejecting or Accepting Parameter Values in Bayesian Estimation. Advances in Methods and Practices in Psychological Science. 2018; doi: 10.1177/2515245918771304

McElreath R. Statistical Rethinking. A Bayesian Course with Examples in R and Stan. Chapman and Hall, 2015

Norman GR, Sloan JA, Wyrwich KW. Interpretation of Changes in Health-related Quality of Life: The Remarkable Universality of Half a Standard Deviation. Medical Care. 2003;41: 582–592. doi: 10.1097/01.MLR.0000062554.74615.4C