# Introduction

This vignette introduces the implementation of a power analysis for the Wald, LR, score and gradient test for linear hypotheses. It uses some IRT examples and treats basic as well as additional features of the package. It is directed towards beginner to intermediate level R users. Previous experience with the mirt package may be helpful. In the following, we will cover the two-step workflow in detail:

• Setup Hypotheses (setup.hypothesis)
• Perform the power analysis (irtpwr Function)

Finally, we provide some additional examples of more complex hypotheses for a 3PL model and a multidimensional model.

## Toy Example

We can load the irtpwr package using:

library(irtpwr)
library(mirt)
#> Loading required package: lattice

We want to know the power and the required sample size for a test of the Rasch vs 2PL model. We use the LSAT 7 data set which is included in the mirt package.

As a first step, we load the data set and fit a 2PL model.

dat <- expand.table(LSAT7)
mirtfit <- mirt(dat, 1, verbose = FALSE)

The 2PL parameters are then used as parameters for the alternative hypothesis in our hypothesis definition.

hyp <- setup.hypothesis(type = "1PLvs2PL", altpars = mirtfit)

We can now perform the power analysis. We want to determine the sample size necessary for a power of .8 while using an alpha level of .05.

res <- irtpwr(hyp = hyp, power = 0.8, alpha = 0.05)
summary(res)
#>
#>  Sample sizes for power = 0.8 (alpha = 0.05):
#>
#>  Statistic    N
#>       Wald 1134
#>         LR  887
#>      Score  932
#>
#> Method: Analytical

Here we see that the gradient statistic would be the most efficient to test our hypothesis: We would need a sample size of 778 to reach our desired power.

We can also plot the power curves to get a more detailed look on the relationships between sample size and power:

plot(res)

# Main Functions and Arguments

We will present the main functions and their arguments in more detail.

## setup.hypothesis

To setup a hypothesis, we can either use a fitted mirt model or specify alternative parameters directly. In both cases, we use the altpars argument of the setup.hypothesis function. An example of direct specification is:

altpars <- list(a = rlnorm(5, sdlog = 0.4), d = rnorm(5))
hyp <- setup.hypothesis(type = "1PLvs2PL", altpars = altpars)

You can use the type argument to specify which type of hypothesis test you want to perform.

The following types are currently available in the package:

• Rasch against 2PL (type = “1PLvs2PL”),
• DIF in 2PL (type = “DIF2PL”),
• PCM against GPCM (type = “PCMvsGPCM”).

We have already seen the Rasch against 2PL hypothesis setup in the toy example above. The procedure for the DIF in 2PL hypothesis is analogous, yet a bit more complicated since we need to define two groups. An example is:

group1 <- group2 <- list(a = rlnorm(5, sdlog = 0.2),
d = rnorm(5))

group2$a[1] <- (group2$a[1])^2
group2$d[1] <- group2$d[1] + 0.5

altpars <- list(group1, group2)

hyp <- setup.hypothesis(type = "DIF2PL", altpars = altpars)

Note that for both hypothesis types, we do not have to provide the parameters under the null hypothesis here, because they are implicitly defined by the alternative hypothesis. The setup.hypothesis function also allows for specifying parameters under the null hypothesis via a “nullpars” argument for cases in which the parameters under the null hypothesis are not identified by the parameters under the alternative. To implement such custom hypotheses, we provide additional guidance in the “adding_hypotheses” vignette. Some templates for hypothesis objects are included in the “hypothesis_templates” vignette.

## irtpwr Function

The irtpwr function can be used to perform the power analysis. Besides an hypothesis, an alpha niveau needs to be specified. The function can:

• calculate a necessary sample size N given a desired power (using the power argument, e.g. power = .8)
• calculate the power for a specific sample size N (using the N argument, e.g. N = 600)

For example, to calculate the power for a sample size of 600 using our DIF hypothesis from above:

res <- irtpwr(hyp = hyp, N = 600, alpha = 0.05)
summary(res)
#>
#>  Power for N = 600 (alpha = 0.05):
#>
#>  Statistic  Power
#>       Wald 0.5411
#>         LR 0.5501
#>      Score 0.5481
#>
#> Method: Analytical

The object resulting from irtpwr can be used to gather further information about the relationship between sample size and power. For example, we can obtain the sample size necessary for reaching a power of .8.

summary(res, power = 0.8)
#>
#>  Sample sizes for power = 0.8 (alpha = 0.05):
#>
#>  Statistic    N
#>       Wald 1061
#>         LR 1039
#>      Score 1044
#>
#> Method: Analytical

We can also obtain the power for a different sample size than originally specified in our execution of the irtpwr function.

summary(res, N = 700)
#>
#>  Power for N = 700 (alpha = 0.05):
#>
#>  Statistic  Power
#>       Wald 0.6109
#>         LR 0.6203
#>      Score 0.6182
#>
#> Method: Analytical

The irtpwr function builds on one of two available methods that are described in our paper in more detail (Zimmer et al. (2022), https://doi.org/10.1007/s11336-022-09883-5):

• an analytical approach (method = “analytical”, this is the default option)
• a sampling-based approach (method = “sampling”)

To use the sampling based approach, we can execute:

res <- irtpwr(hyp = hyp, N = 600, alpha = 0.05, method = "sampling")
summary(res)
#>
#>  Power for N = 600 (alpha = 0.05):
#>
#>  Statistic  Power
#>       Wald 0.5877
#>         LR 0.5975
#>      Score 0.5956
#>
#> Method: Sampling-Based

The sampling-based approach offers two parameters that can be increased from their default values to obtain a more exact result. These are the sample size of the sampling-based approach (sampling.npers) and the sample size for the approximation of the Fisher expected matrix (approx.npers). These may be tweaked upwards if the result from repeated executions of the irtpwr function is not stable.

Since the analytical approach can be too time-intensive for larger numbers of items, we might want to know beforehand how long it will take approximately. We do so using the calc.time function.

In our example hypothesis, we used 5 items. To estimate the computation time for 7 items we can use:

calc.time(hyp, n.items = 7)
#> [1] 98.56

Note that the estimated time can sometimes deviate from the actual time. As a result from our test runs, we suggest to expect an increase of 80% as a worst case scenario.

# Further Examples

To demonstrate extending the method to models not treated in our paper, we test some basic hypotheses in the 3PL model and a multidimensional model. We present only the sampling-based method here, but the analytical method can be extended to account for it in the future.

## 3PL Model

We test the null hypothesis that the first item parameters are (1,0,.2) for the a,d, and g parameters respectively. This hypothesis is implemented as the type “3PL_basic” we later use in the setup.hypothesis function. The parameters under the alternative hypothesis are:

altpars <- list(
a = c(1.1,seq(.5,1.4,length.out=4)),
d = c(.1,seq(1.3,-1.3,length.out=4)),
g = c(.22,rep(.2,4))
)

We setup the hypothesis:

hyp <- setup.hypothesis(type = "3PL_basic", altpars = altpars)

We calculate the power using a sampling-based approach. Since we ran into convergence problems using the default SE.type value, we choose an alternative (SE.type = “Fisher”).

res <- irtpwr(hyp = hyp, alpha = 0.05, power = 0.8,
method = "sampling", SE.type = "Fisher")

We can obtain the result using

summary(res)
#>
#>  Sample sizes for power = 0.8 (alpha = 0.05):
#>
#>  Statistic    N
#>       Wald 3658
#>         LR 3956
#>      Score 3967
#>
#> Method: Sampling-Based

## Multidimensional Model

We test the null hypothesis that the difficulty of the first two item parameters is equal in a multidimensional 2PL model. This hypothesis is implemented as the type “multi_basic” we later use in the setup.hypothesis function. The parameters under the alternative hypothesis are taken from the LSAT7 dataset:

dat <- expand.table(LSAT7)
dat <- dat[, c(2, 4, 1, 3, 5)]  # re-ordering items so that items 1 and 2 of the resulting data frame are compared
model_an <- "F1 = 1-5
F2 = 1-4"
altpars <- mirt(dat, model = mirt.model(model_an),
verbose = FALSE)

We setup the hypothesis:

hyp <- setup.hypothesis(type = "multi_basic", altpars = altpars)

We calculate the power using a sampling-based approach. We again encountered convergence problems using the default SE.type value, so we choose an alternative (SE.type = “Fisher”).

res <- irtpwr(hyp = hyp, alpha = 0.05, power = 0.8,
method = "sampling", SE.type = "Fisher")
#> EM cycles terminated after 5000 iterations.

We can obtain the result using

summary(res)
#>
#>  Sample sizes for power = 0.8 (alpha = 0.05):
#>
#>  Statistic    N
#>       Wald 1584
#>         LR 1164
#>      Score 1151
#> Method: Sampling-Based