Quick Usage Guide

Colman Humphrey

2018-11-25

For even quicker usage instructions, see the readme.

The general idea of this package is that you can throw nearly any model built in the typical R fashion. For now, we just need a somewhat standard way of extracting fixed effects, and that update works, which is nearly always the case.

If your model is called base_model for example, then if coef(summary(base_model)) or coef(summary(base_model))$cond gives the estimates and standard errors, the method will work.

The data should be supplied manually, although it’s currently the case that if the dataframe that creates the model is accessible from base_model$model, base_model$frame or base_model@frame, you won’t have to send the data in to the function separately. Be careful here - some models supply this but the method introduces bugs. Much safer to supply the data explicitly.

Let’s work through an example with glmmTMB. We’ll use the data that comes with glmmboot.

library(glmmboot)
data(test_data)

head(test_data)
#>       x_var1    x_var2     x_var3 subj          y
#> 1 -0.3603511 2.7427146  0.2636070    3 1.00000000
#> 2  2.6866123 1.5713332  0.1509874   20 0.08064748
#> 3  0.6342672 0.9936967  0.1261363   19 0.43073744
#> 4  1.1648519 1.6567148  0.6139765   33 0.98353502
#> 5  0.7842167 1.3459927 -0.7063382    1 1.00000000
#> 6  2.8261973 1.9342870  1.2697793    3 1.00000000

We’re assuming the x_var variables are fixed, and subj is to be treated as a random effect.

Thus our base analysis is:

library(glmmTMB)
model_formula <- as.formula(y ~ x_var1 + x_var2 + x_var2 + (1 | subj))

base_run <- glmmTMB(formula = model_formula,
                    data = test_data,
                    family = binomial)
#> Warning in eval(family$initialize): non-integer #successes in a binomial
#> glm!

We get a warning because the outcome data is proportional. Not to worry.

Now we’ll use the bootstrap. By default it’ll perform block bootstrapping over the highest entropy random effect - but there’s only one, so of course the winner is subj.

bootstrap_over_subj <- BootGlmm(base_model = base_run,
                                resamples = 99, 
                                base_data = test_data,
                                num_cores = 4)
#> Performing block resampling, over subj

For publications etc, better to run about ten thousand resamples to avoid noise having much of an effect. Of course 99 is far too small, only for an example.

And comparing results:

print(bootstrap_over_subj)
#>              estimate boot 2.5% boot 97.5%  boot p_value base p_value
#> (Intercept) 0.6018968   -0.1141     1.0557          0.18       0.1636
#> x_var1      0.1909343   -0.1395     0.5062          0.14       0.2338
#> x_var2      0.3140507    0.0874     0.5797          0.04       0.0396
#>             base 2.5% base 97.5% boot/base width
#> (Intercept)   -0.2449     1.4487       0.6907459
#> x_var1        -0.1234     0.5052       1.0272865
#> x_var2         0.0150     0.6131       0.8231729

This of course might take a long time. If it takes far too long on your machine, you can ideally run it on a bunch of computers. We don’t want each computer to output the fully processed output, only the intermediate outcome. To do this, we set return_coefs_instead = TRUE for each run:

b_list1 <- BootGlmm(base_model = base_run,
                    resamples = 29, 
                    base_data = test_data,
                    num_cores = 1,
                    return_coefs_instead = TRUE)
#> Performing block resampling, over subj
b_list2 <- BootGlmm(base_model = base_run,
                    resamples = 30, 
                    base_data = test_data,
                    num_cores = 1,
                    return_coefs_instead = TRUE)
#> Performing block resampling, over subj
b_list3 <- BootGlmm(base_model = base_run,
                    resamples = 30, 
                    base_data = test_data,
                    num_cores = 1,
                    return_coefs_instead = TRUE)
#> Performing block resampling, over subj

Combining this is simple enough. If we’ve used a few, we don’t want to mess around with even more lists, so we can enter them into the relevant function:

CombineResampledLists(b_list1, b_list2, b_list3)
#>              estimate boot 2.5% boot 97.5%  boot p_value base p_value
#> (Intercept) 0.6018968   -0.0398     1.2224        0.0889       0.1636
#> x_var1      0.1909343   -0.1418     0.5538        0.2222       0.2338
#> x_var2      0.3140507    0.0585     0.5161        0.0222       0.0396
#>             base 2.5% base 97.5% boot/base width
#> (Intercept)   -0.2449     1.4487       0.7452824
#> x_var1        -0.1234     0.5052       1.1065888
#> x_var2         0.0150     0.6131       0.7650518

If we’ve run a huge number of such runs, ideally we’ll combine all output to a list of lists, like so:

list_of_lists_output <- list(b_list1, b_list2, b_list3)

And we’ll get the same result:

CombineResampledLists(list_of_lists_output)
#>              estimate boot 2.5% boot 97.5%  boot p_value base p_value
#> (Intercept) 0.6018968   -0.0398     1.2224        0.0889       0.1636
#> x_var1      0.1909343   -0.1418     0.5538        0.2222       0.2338
#> x_var2      0.3140507    0.0585     0.5161        0.0222       0.0396
#>             base 2.5% base 97.5% boot/base width
#> (Intercept)   -0.2449     1.4487       0.7452824
#> x_var1        -0.1234     0.5052       1.1065888
#> x_var2         0.0150     0.6131       0.7650518