Simulate Data from Generalized Linear Models - legacy code

2019-05-30

Simulated Logistic Models

The simglm package offers users the ability to simulate from a variety of generalized linear models, both single level and multilevel generalized models. Instead of using the sim_reg function to call these, there is now a sim_glm function to use.

Similar to the sim_reg function, one benefit of this package for simulation is that the intermediate steps are returned as well. This is useful for additional processing of the data, for example to add in your own missing data function.

Here is an example for simulating a single level logistic model:

fixed <- ~ 1 + act + diff
fixed_param <- c(2, 0.5, 0.3)
cov_param <- list(dist_fun = c('rnorm', 'rnorm'),
                  var_type = c("single", "single"),
                  opts = list(list(mean = 0, sd = 4),
                              list(mean = 0, sd = 3)))
n <- 150

temp_single <- sim_glm(fixed = fixed, fixed_param = fixed_param, 
                       cov_param = cov_param, 
                       n = n, data_str = "single", outcome_type = 'logistic')
head(temp_single, n = 5)
##   X.Intercept.       act      diff     Fbeta  logistic sim_data ID
## 1            1 -2.928576 -6.106825 -1.296335 0.2147824        0  1
## 2            1  4.247863 -2.550834  3.358681 0.9663880        1  2
## 3            1  1.325616 -1.264119  2.283572 0.9075073        1  3
## 4            1 -7.933378  2.787676 -1.130386 0.2440898        0  4
## 5            1 -4.042249 -5.794621 -1.759511 0.1468516        1  5

As you can see from the code above, the syntax is virtually identical to the syntax for the sim_reg function. The largest difference is the omission of the error_var and rand_gen commands. The returned data frame includes the response variable in the logistic function (Fbeta), the probability found by taking \(\frac{exp(Fbeta)}{1 + exp(Fbeta)}\) (logistic), and the returned 0/1 variable by using the rbinom function using the probabilities defined above (sim_data).

Multilevel logistic models

Adding in additional levels is straightforward and again very similar to the sim_reg function. Here is a two level example with students nested in classrooms, the act variable is treated as a classroom variable:

##   X.Intercept.       diff       act      b0     Fbeta randEff logistic
## 1            1 -0.5808425 -1.109843 2.90443 1.3766259 2.90443 4.281056
## 2            1 -0.8688169 -1.109843 2.90443 1.2326387 2.90443 4.137069
## 3            1 -1.6730168 -1.109843 2.90443 0.8305388 2.90443 3.734969
## 4            1 -2.8333092 -1.109843 2.90443 0.2503925 2.90443 3.154823
## 5            1 -2.2453243 -1.109843 2.90443 0.5443850 2.90443 3.448815
##        prob sim_data withinID clustID
## 1 0.9863606        1        1       1
## 2 0.9842814        1        2       1
## 3 0.9766828        1        3       1
## 4 0.9590983        1        4       1
## 5 0.9691958        1        5       1

Three level example

Below is sample code for a three level example. Primary differences are the additional terms associated with the third level.

##   X.Intercept.       diff       act  actClust      b0_2     b0_3     Fbeta
## 1            1 -2.0036192 0.7191938 -1.815508 -2.165191 1.797921 0.5079254
## 2            1  0.3555894 0.7191938 -1.815508 -2.165191 1.797921 2.3952923
## 3            1 -1.6734751 0.7191938 -1.815508 -2.165191 1.797921 0.7720407
## 4            1 -0.6381194 0.7191938 -1.815508 -2.165191 1.797921 1.6003253
## 5            1 -2.0222088 0.7191938 -1.815508 -2.165191 1.797921 0.4930538
##     randEff randEff3  logistic      prob sim_data withinID clustID
## 1 -2.165191 1.797921 0.1406555 0.5351060        1        1       1
## 2 -2.165191 1.797921 2.0280224 0.8837080        1        2       1
## 3 -2.165191 1.797921 0.4047708 0.5998334        0        3       1
## 4 -2.165191 1.797921 1.2330554 0.7743529        1        4       1
## 5 -2.165191 1.797921 0.1257839 0.5314046        1        5       1
##   clust3ID
## 1        1
## 2        1
## 3        1
## 4        1
## 5        1

Count Outcomes

The package also has the ability to simulate count outcomes in addition to the 0/1 dichotomous outcomes. The syntax is the same as above, except the outcome_type argument is modified from logistic to poisson. As some may have realized from above, the dichotomous outcomes are assumed to follow a logistic metric where as the count outcomes follow a poisson model. Belwo is an adapted example from above.

fixed <- ~ 1 + act + diff
fixed_param <- c(-0.5, 0.5, 0.3)
cov_param <- list(dist_fun = c('rnorm', 'rnorm'),
                  var_type = c("single", "single"),
                  opts = list(list(mean = 0, sd = 4),
                              list(mean = 0, sd = 3)))
n <- 150

temp_single <- sim_glm(fixed = fixed, fixed_param = fixed_param, 
                       cov_param = cov_param, 
                       n = n, data_str = "single", outcome_type = 'poisson')
head(temp_single, n = 5)
##   X.Intercept.       act       diff       Fbeta   poisson sim_data ID
## 1            1 -1.655157 -0.2838826 -1.41274351 0.2434744        0  1
## 2            1  3.373535 -4.2538885 -0.08939913 0.9144805        0  2
## 3            1 -6.840285  5.9891692 -2.12339193 0.1196252        0  3
## 4            1  1.618381 -3.7367856 -0.81184524 0.4440380        0  4
## 5            1  1.314580  1.5309700  0.61658088 1.8525830        3  5