Introduction

Package Overview

Often the first problem in understanding the generalized linear model in a practical way is finding good data. Common problems include finding data with a small number of rows, the response variable does not follow a family in the glm framework, or the data is messy and needs a lot of work before statistical analysis can begin. This package alleviates all of these by allowing you to create the data you want. With data in hand, you can empirically answer any question you have.

The goal of this package is to strike a balance between mathematical flexibility and simplicity of use. You can control the sample size, link function, number of unrelated variables, and dispersion for continuous distributions. Default values are carefully chosen so data can be generated without thinking about mathematical connections between weights, links, and distributions.

Below will demonstrate how to use this package to answer questions about the glm framework.

Question 1: Is A Sample Size Of 200 Enough To Get Close Estimates Of The True Weights?

library(GlmSimulatoR)
library(ggplot2)

set.seed(1)
simdata <- simulate_gaussian(N = 200, weights = c(1, 2, 3))

glmModel <- glm(Y ~ X1 + X2 + X3, data = simdata, family = gaussian(link = "identity"))
summary(glmModel)
#> 
#> Call:
#> glm(formula = Y ~ X1 + X2 + X3, family = gaussian(link = "identity"), 
#>     data = simdata)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.9437  -0.7212  -0.0368   0.7070   3.6459  
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   2.9138     0.7012   4.156 4.84e-05 ***
#> X1            0.9834     0.2868   3.428  0.00074 ***
#> X2            1.7882     0.2702   6.619 3.39e-10 ***
#> X3            3.2822     0.2640  12.430  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for gaussian family taken to be 1.178874)
#> 
#>     Null deviance: 479.50  on 199  degrees of freedom
#> Residual deviance: 231.06  on 196  degrees of freedom
#> AIC: 606.45
#> 
#> Number of Fisher Scoring iterations: 2

rm(glmModel)

In the above, we see the estimates are close to the weights argument. The mathematics behind the generalized linear model worked well.