Statistics for Mixed Effects Models

Daniel Lüdecke

2019-01-05

Statistics and Measures for Mixed Effects Models

This vignettes demontrates those functions of the sjstats-package that deal especially with mixed effects models. sjstats provides following functions:

Befor we start, we fit a simple linear mixed model:

library(sjstats)
library(lme4)
# load sample data
data(sleepstudy)

# fit linear mixed model
m <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

set.seed(2018)
sleepstudy$mygrp <- sample(1:45, size = 180, replace = TRUE)
m2 <- lmer(Reaction ~ Days + (1 | mygrp) + (1 | Subject), sleepstudy)

Sample Size Calculation for Mixed Models

The first two functions, deff() and smpsize_lmm(), can be used to approximately calculate the sample size in the context of power calculation. Calculating the sample size for simple linear models is pretty straightforward, however, for (linear) mixed models, statistical power is affected through the change of the variance of test statistics. This is what Hsieh et al. (2003) call a design effect (or variance inflation factor, VIF). Once this design effect is calculated, the sample size calculated for a standard design can be adjusted accordingly.

Design Effect for Two-Level Mixed Models

deff() computes this design effect for linear mixed models with two-level design. It requires the approximated average number of observations per grouping cluster (i.e. level-2 unit) and the assumed intraclass correlation coefficient (ICC) for the multilevel-model. Typically, the minimum assumed value for the ICC is 0.05.

# Design effect for two-level model with 30 observations per
# cluster group (level-2 unit) and an assumed intraclass
# correlation coefficient of 0.05.
deff(n = 30)
#> [1] 2.45

# Design effect for two-level model with 24 observation per cluster
# group and an assumed intraclass correlation coefficient of 0.2.
deff(n = 24, icc = 0.2)
#> [1] 5.6

Calculating the Sample Size for Linear Mixed Models

smpsize_lmm() combines the functions for power calculation from the pwr-package and design effect deff(). It computes an approximated sample size for linear mixed models (two-level-designs), based on power-calculation for standard design and adjusted for design effect for 2-level-designs.

# Sample size for multilevel model with 30 cluster groups and a small to
# medium effect size (Cohen's d) of 0.3. 27 subjects per cluster and
# hence a total sample size of about 802 observations is needed.
smpsize_lmm(eff.size = .3, k = 30)
#> $`Subjects per Cluster`
#> [1] 27
#> 
#> $`Total Sample Size`
#> [1] 802

# Sample size for multilevel model with 20 cluster groups and a medium
# to large effect size for linear models of 0.2. Five subjects per cluster and
# hence a total sample size of about 107 observations is needed.
smpsize_lmm(eff.size = .2, df.n = 5, k = 20, power = .9)
#> $`Subjects per Cluster`
#> [1] 5
#> 
#> $`Total Sample Size`
#> [1] 107

There are more ways to perform power calculations for multilevel models, however, most of these require very detailed knowledge about the sample characteristics and performing simulation studys. smpsize_lmm() is a more pragmatic alternative to these approaches.

Trouble Shooting

Sometimes, when fitting mixed models, covergence warnings or warnings about singularity may come up (see details on these issues in this FAQ). These warnings sometimes arise due to the strict tresholds and / or may be safely ignored, but sometimes indicate overly complex models or models with poorly defined random structure. converge_ok() and is_singular() may help to check whether such a warning is problematic or not.

converge_ok() provides an alternative convergence test for merMod-objects (with a less strict treshold, as suggested by one of the lme4-package authors), while is_singular() can be used in case of post-fitting convergence warnings, such as warnings about negative eigenvalues of the Hessian. Typically, you want TRUE for converge_ok() and non-singular models (i.e. FALSE for is_singular()).

converge_ok(m)
#> 1.79669205387819e-07 
#>                 TRUE

is_singular(m)
#> [1] FALSE

Regarding singular models, there may be some concerns that should be checked:

Singular fits are likely to occur when the numbers of random-effect levels is small or for complex random-effects models, e.g. models with several different random-slopes terms. There are several (contradicting) proposals how to deal with singularity, although there is no consensus about the best approach:

See ?is_singular and ?lme4::isSingular for further details.

Rescale model weights for complex samples

Most functions to fit multilevel and mixed effects models only allow to specify frequency weights, but not design (i.e. sampling or probability) weights, which should be used when analyzing complex samples and survey data.

scale_weights() implements an algorithm proposed by Aaparouhov (2006) and Carle (2009) to rescale design weights in survey data to account for the grouping structure of multilevel models, which then can be used for multilevel modelling.

To calculate a weight-vector that can be used in multilevel models, scale_weights() needs the data frame with survey data as x-argument. This data frame should contain 1) a cluster ID (argument cluster.id), which represents the strata of the survey data (the level-2-cluster variable) and 2) the probability weights (argument pweight), which represents the design or sampling weights of the survey data (level-1-weight).

scale_weights() then returns the original data frame, including two new variables: svywght_a, where the sample weights pweight are adjusted by a factor that represents the proportion of cluster size divided by the sum of sampling weights within each cluster. The adjustment factor for svywght_b is the sum of sample weights within each cluster devided by the sum of squared sample weights within each cluster (see Carle (2009), Appendix B, for details).

data(nhanes_sample)
scale_weights(nhanes_sample, SDMVSTRA, WTINT2YR)
#> # A tibble: 2,992 x 9
#>    total   age RIAGENDR RIDRETH1 SDMVPSU SDMVSTRA WTINT2YR svywght_a svywght_b
#>    <dbl> <dbl>    <dbl>    <dbl>   <dbl>    <dbl>    <dbl>     <dbl>     <dbl>
#>  1     1  2.2         1        3       2       31   97594.     1.57      1.20 
#>  2     7  2.08        2        3       1       29   39599.     0.623     0.525
#>  3     3  1.48        2        1       2       42   26620.     0.898     0.544
#>  4     4  1.32        2        4       2       33   34999.     0.708     0.550
#>  5     1  2           2        1       1       41   14746.     0.422     0.312
#>  6     6  2.2         2        4       1       38   28232.     0.688     0.516
#>  7   350  1.6         1        3       2       33   93162.     1.89      1.46 
#>  8    NA  1.48        2        3       1       29   82276.     1.29      1.09 
#>  9     3  2.28        2        4       1       41   24726.     0.707     0.523
#> 10    30  0.84        1        3       2       35   39895.     0.760     0.594
#> # ... with 2,982 more rows

P-Values

For linear mixed models, the summary() in lme4 does not report p-values. Other objects from regression functions are not consistent in their output structure when reporting p-values. p_value() aims at returning a standardized (“tidy”) output for any regression model. The return value is always a data frame with three columns: term, p.value and std.error, which represent the name, p-value and standard error for each term.

For linear mixed models, the approximation of p-values are more precise when p.kr = TRUE, based on conditional F-tests with Kenward-Roger approximation for the degrees of freedom (calling pbkrtest::get_Lb_ddf()).

# Using the t-statistics for calculating the p-value
p_value(m2)
#>          term p.value std.error
#> 1 (Intercept)       0     9.933
#> 2        Days       0     0.789

# p-values based on conditional F-tests with 
# Kenward-Roger approximation for the degrees of freedom
p_value(m2, p.kr = TRUE)
#>          term p.value std.error
#> 1 (Intercept)       0     9.945
#> 2        Days       0     0.796

To see more details on the degrees of freedom when using Kenward-Roger approximation, use the summary()-method:

pv <- p_value(m2, p.kr = TRUE)
summary(pv)
#>          term p.value std.error      df statistic
#> 1 (Intercept)       0     9.945  23.152    25.241
#> 2        Days       0     0.796 154.722    13.133

Random Effect Variances

In mixed effects models, several random effect variances (depending on the model specification) can be calculated:

You can access on of these values with get_re_var(), or all of them with re_var():

# get residual variance
get_re_var(m, "sigma_2")
#> [1] 654.941

# get all random effect variances
re_var(m)
#>       Within-group-variance:  654.941
#>      Between-group-variance:  612.090 (Subject)
#>       Random-slope-variance:   35.072 (Subject.Days)
#>  Slope-Intercept-covariance:    9.604 (Subject.(Intercept))
#> Slope-Intercept-correlation:    0.066 (Subject)

Further variance components can be calculated using adjusted = TRUE. In this case, re_var() returns the variance attributable to the fixed effects, the variance of the random effects, the variance due to additive dispersion and the distribution-specific variance. The residual variance is the sum of the dispersion and distribution-specific variance (see Nakagawa et al. (2017) and Johnson et al. (2014)):

# get all variance components of mixed models
re_var(m, adjusted = TRUE)
#> 
#> Variance Components of Mixed Models
#> 
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (Days | Subject)
#> 
#>          fixed:  908.953
#>         random: 1698.071
#>       residual:  654.941
#>     dispersion:    0.000
#>   distribution:  654.941

R-squared

Nakagawa et al. (2017) proposed a method to compute marginal and conditional r-squared values, which is implemented in the r2()-function. For mixed models, the marginal r-squared considers only the variance of the fixed effects, while the conditional r-squared takes both the fixed and random effects into account. r2() can be used with models fitted with the functions of the lme4 and glmmTMB packages.

r2(m)
#> 
#> R-Squared for Generalized Linear Mixed Model
#> 
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (Days | Subject)
#> 
#>    Marginal R2: 0.279
#> Conditional R2: 0.799

Intraclass-Correlation Coefficient

The components of the random effect variances are of interest when calculating the intraclass-correlation coefficient, ICC. The ICC is calculated by dividing the between-group-variance (random intercept variance) by the total variance (i.e. sum of between-group-variance and within-group (residual) variance). The ICC can be interpreted as “the proportion of the variance explained by the grouping structure in the population” (Hox 2002: 15).

Usually, the ICC is calculated for the null model (“unconditional model”). However, according to Raudenbush and Bryk (2002) or Rabe-Hesketh and Skrondal (2012) it is also feasible to compute the ICC for full models with covariates (“conditional models”) and compare how much a level-2 variable explains the portion of variation in the grouping structure (random intercept).

The ICC for mixed models can be computed with icc(). Caution: For random-slope-intercept models, the ICC would differ at each unit of the predictors. Hence, the ICC for these kind of models cannot be understood simply as proportion of variance (see Goldstein et al. 2010). For convenience reasons, as the icc() function is also used to extract the different random effects variances (see re_var() above), the ICC for random-slope-intercept-models is reported nonetheless, but it is usually no meaningful summary of the proportion of variances. To get a meaningful ICC also for models with random slopes, use adjusted = TRUE. The adjusted ICC uses the mean random effect variance, which is based on the random effect variances for each value of the random slope (see Johnson 2014).

By default, for three-level-models, depending on the nested structure of the model, or for cross-classified models, icc() only reports the proportion of variance explained for each grouping level. Use adjusted = TRUE to calculate the adjusted and conditional ICC that take all random effect variances into account.

icc(m)
#> Caution! ICC for random-slope-intercept models usually not meaningful. Use `adjusted = TRUE` to use the mean random effect variance to calculate the ICC. See 'Note' in `?icc`.
#> 
#> Intraclass Correlation Coefficient for Linear mixed model
#> 
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (Days | Subject)
#> 
#>   ICC (Subject): 0.4831

icc(m2)
#> 
#> Intraclass Correlation Coefficient for Linear mixed model
#> 
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (1 | mygrp) + (1 | Subject)
#> 
#>     ICC (mygrp): 0.0383
#>   ICC (Subject): 0.5981

If adjusted = TRUE, an adjusted and a conditional ICC are calculated, which take all sources of uncertainty (of all random effects) into account to report an “adjusted” ICC, as well as the conditional ICC. The latter also takes the fixed effects variances into account (see Nakagawa et al. 2017). If random effects are not nested and not cross-classified, the adjusted (adjusted = TRUE) and unadjusted (adjusted = FALSE) ICC are identical.

icc(m, adjusted = TRUE)
#> 
#> Intraclass Correlation Coefficient for Generalized Linear Mixed Model
#> 
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (Days | Subject)
#> 
#>    Adjusted ICC: 0.7217
#> Conditional ICC: 0.5206

icc(m2, adjusted = TRUE)
#> 
#> Intraclass Correlation Coefficient for Generalized Linear Mixed Model
#> 
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (1 | mygrp) + (1 | Subject)
#> 
#>    Adjusted ICC: 0.6363
#> Conditional ICC: 0.4609

References

Aaparouhov T. 2006. General Multi-Level Modeling with Sampling Weights. Communications in Statistics—Theory and Methods (35): 439–460

Barr DJ, Levy R, Scheepers C, Tily HJ. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3):255–278

Bates D, Kliegl R, Vasishth S, Baayen H. 2015. Parsimonious Mixed Models. arXiv:1506.04967

Carle AC. 2009. Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 9(49): 1-13

Goldstein H, Browne W, Rasbash J. 2010. Partitioning Variation in Multilevel Models. Understanding Statistics, 1:4, 223-231, doi: 10.1207/S15328031US0104_02

Hox J. 2002. Multilevel analysis: techniques and applications. Mahwah, NJ: Erlbaum

Hsieh FY, Lavori PW, Cohen HJ, Feussner JR. 2003. An Overview of Variance Inflation Factors for Sample-Size Calculation. Evaluation & the Health Professions 26: 239–257. doi: 10.1177/0163278703255230

Johnson PC, O’Hara RB. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. doi: 10.1111/2041-210X.12225

Matuschek H, Kliegl R, Vasishth S, Baayen H, Bates D. 2017.Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94:305–315

Nakagawa S, Johnson P, Schielzeth H. 2017. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisted and expanded. J. R. Soc. Interface 14. doi: 10.1098/rsif.2017.0213

Rabe-Hesketh S, Skrondal A. 2012. Multilevel and longitudinal modeling using Stata. 3rd ed. College Station, Tex: Stata Press Publication

Raudenbush SW, Bryk AS. 2002. Hierarchical linear models: applications and data analysis methods. 2nd ed. Thousand Oaks: Sage Publications