The package contains functions to calculate power and estimate sample size for various study designs used in (not only bio-) equivalence studies.

Version 1.4.9.9999 built 2020-08-04 with R 4.0.2.

```
# design name df
# parallel 2 parallel groups n-2
# 2x2 2x2 crossover n-2
# 2x2x2 2x2x2 crossover n-2
# 3x3 3x3 crossover 2*n-4
# 3x6x3 3x6x3 crossover 2*n-4
# 4x4 4x4 crossover 3*n-6
# 2x2x3 2x2x3 replicate crossover 2*n-3
# 2x2x4 2x2x4 replicate crossover 3*n-4
# 2x4x4 2x4x4 replicate crossover 3*n-4
# 2x3x3 partial replicate (2x3x3) 2*n-3
# 2x4x2 Balaam's (2x4x2) n-2
# 2x2x2r Liu's 2x2x2 repeated x-over 3*n-2
# paired paired means n-1
```

Codes of designs follow this pattern: `treatments x sequences x periods`

.

Although some replicate designs are more ‘popular’ than others, sample size estimations are valid for *all* of the following designs:

design | type | sequences | periods |
---|---|---|---|

`2x2x4` |
full | 2 `TRTR\|RTRT` |
4 |

`2x2x4` |
full | 2 `TRRT\|RTTR` |
4 |

`2x2x4` |
full | 2 `TTRR\|RRTT` |
4 |

`2x2x3` |
full | 2 `TRT\|RTR` |
3 |

`2x2x3` |
full | 2 `TRR\|RTT` |
3 |

`2x3x3` |
partial | 3 `TRR\|RTR\|RRT` |
3 |

Whilst “2x4x4” four period full replicate designs with four sequences (TRTR|RTRT|TRRT|RTTR *or* TRRT|RTTR|TTRR|RRTT) are supported, they should be avoided due to confounded effects.

For various methods power can be *calculated* based on

- nominal
*α*, coefficient of variation (*CV*), deviation of test from reference (*θ*_{0}), acceptance limits {*θ*_{1},*θ*_{2}}, sample size (*n*), and design.

For all methods the sample size can be *estimated* based on

- nominal
*α*, coefficient of variation (*CV*), deviation of test from reference (*θ*_{0}), acceptance limits {*θ*_{1},*θ*_{2}}, target power, and design.

Power covers balanced as well as unbalanced sequences in crossover or replicate designs and equal/unequal group sizes in two-group parallel designs. Sample sizes are always rounded up to achieve balanced sequences or equal group sizes.

- Average Bioequivalence (with arbitrary
*fixed*limits). - Two simultaneous TOST procedures.
- Non-inferiority
*t*-test. - Ratio of two means with normally distributed data on the original scale based on Fieller’s (‘fiducial’) confidence interval.
- ‘Expected’ power in case of uncertain (estimated) variability and/or uncertain
*θ*_{0}. - Reference-scaled bioequivalence based on simulations.
- EMA: Average Bioequivalence with Expanding Limits (ABEL).

- FDA: Reference-scaled Average Bioequivalence (RSABE) for Highly Variable Drugs / Drug Products and Narrow Therapeutic Index Drugs (NTIDs).

- EMA: Average Bioequivalence with Expanding Limits (ABEL).
- Iteratively adjust
*α*to control the type I error in ABEL and RSABE. - Dose-Proportionality using the power model.

- Exact
- Owen’s Q.
- Direct integration of the bivariate non-central
*t*-distribution.

- Approximations
- Non-central
*t*-distribution. - ‘Shifted’ central
*t*-distribution.

- Non-central

- Calculate
*CV*from*MSE*or*SE*(and vice versa). - Calculate
*CV*from given confidence interval. - Calculate
*CV*from the upper expanded limit of an ABEL study._{wR} - Confidence interval of
*CV*. - Pool
*CV*from several studies. - Confidence interval for given
*α*,*CV*, point estimate, sample size, and design. - Calculate
*CV*and_{wT}*CV*from a (pooled)_{wR}*CV*assuming a ratio of intra-subject variances._{w} *p*-values of the TOST procedure.- Analysis tool for exploration/visualization of the impact of expected values (
*CV*,*θ*_{0}, reduced sample size due to dropouts) on power of BE decision. - Construct design matrices of incomplete block designs.

*α*0.05, {*θ*_{1},*θ*_{2}} (0.80, 1.25). Details of the sample size search (and the regulatory settings in reference-scaled average bioequivalence) are printed.- Note: In all functions values have to be given as ratios, not in percent.

*θ*_{0} 0.95, target power 0.80, design “2x2” (TR|RT), exact method (Owen’s Q).

*α* 0.05, point estimate constraint (0.80, 1.25), homoscedasticity (*CV _{wT}* =

- EMA, WHO, Health Canada, and many others: Average bioequivalence with expanding limits (ABEL).
- FDA: RSABE.

*θ*_{0} 0.90 as recommended by Tóthfalusi and Endrényi (2011).

Regulatory constant `0.76`

, upper cap of scaling at *CV _{wR}* 50%, evaluation by ANOVA.

Regulatory constant `0.76`

, upper cap of scaling at *CV _{wR}* ~57.4%, evaluation by intra-subject contrasts.

Regulatory constant `log(1.25)/0.25`

, linearized scaled ABE (Howe’s approximation).

*θ*_{0} 0.975, regulatory constant `log(1.11111)/0.1`

, upper cap of scaling at *CV _{wR}* ~21.4%, design “2x2x4” (TRTR|RTRT), linearized scaled ABE (Howe’s approximation), upper limit of the confidence interval of

*β*_{0} (slope) `1+log(0.95)/log(rd)`

where `rd`

is the ratio of the highest and lowest dose, target power 0.80, crossover design, details of the sample size search suppressed.

Minimum acceptable power 0.70. *θ*_{0}, design, conditions, and sample size method depend on defaults of the respective approaches (ABE, ABEL, RSABE, NTID).

Before running the examples attach the library.

If not noted otherwise, defaults are employed.

Power for total *CV* 0.35, *θ*_{0} 0.95, group sizes 52 and 49, design “parallel”.

Sample size for assumed intra-subject *CV* 0.20.

```
sampleN.TOST(CV = 0.20)
#
# +++++++++++ Equivalence test - TOST +++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2 crossover
# log-transformed data (multiplicative model)
#
# alpha = 0.05, target power = 0.8
# BE margins = 0.8 ... 1.25
# True ratio = 0.95, CV = 0.2
#
# Sample size (total)
# n power
# 20 0.834680
```

Sample size for equivalence of the ratio of two means with normality on original scale based on Fieller’s (‘fiducial’) confidence interval. *CV _{w}* 0.20,

Note the default

```
sampleN.RatioF(CV = 0.20, CVb = 0.40)
#
# +++++++++++ Equivalence test - TOST +++++++++++
# based on Fieller's confidence interval
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2 crossover
# Ratio of means with normality on original scale
# alpha = 0.025, target power = 0.8
# BE margins = 0.8 ... 1.25
# True ratio = 0.95, CVw = 0.2, CVb = 0.4
#
# Sample size
# n power
# 28 0.807774
```

Sample size for assumed intra-subject *CV* 0.45, *θ*_{0} 0.90, three period full replicate study “2x2x3” (TRT|RTR *or* TRR|RTT).

```
sampleN.TOST(CV = 0.45, theta0 = 0.90, design = "2x2x3")
#
# +++++++++++ Equivalence test - TOST +++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2x3 (3 period full replicate)
# log-transformed data (multiplicative model)
#
# alpha = 0.05, target power = 0.8
# BE margins = 0.8 ... 1.25
# True ratio = 0.9, CV = 0.45
#
# Sample size (total)
# n power
# 124 0.800125
```

Note that the conventional model assumes homoscedasticity. For heteroscedasticity we can ‘switch off’ all conditions of one of the methods for reference-scaled ABE. We assume a σ^{2} ratio of ⅔ (*i.e.*, T has a lower variability than R). Only relevant columns of the data.frame shown.

```
reg <- reg_const("USER", r_const = NA, CVswitch = Inf,
CVcap = Inf, pe_constr = FALSE)
CV <- CVp2CV(CV = 0.45, ratio = 2/3)
res <- sampleN.scABEL(CV=CV, design = "2x2x3", regulator = reg,
details = FALSE, print = FALSE)
print(res[c(3:4, 8:9)], digits = 5, row.names = FALSE)
# CVwT CVwR Sample size Achieved power
# 0.3987 0.49767 126 0.8052
```

Similar sample size because the pooled *CV* is still 0.45.

Sample size assuming homoscedasticity (*CV _{w}* = 0.45).

```
sampleN.scABEL(CV = 0.45, details = TRUE)
#
# +++++++++++ scaled (widened) ABEL +++++++++++
# Sample size estimation
# (simulation based on ANOVA evaluation)
# ---------------------------------------------
# Study design: 2x3x3 (partial replicate)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.45; CVw(R) = 0.45
# True ratio = 0.9
# ABE limits / PE constraint = 0.8 ... 1.25
# EMA regulatory settings
# - CVswitch = 0.3
# - cap on scABEL if CVw(R) > 0.5
# - regulatory constant = 0.76
# - pe constraint applied
#
#
# Sample size search
# n power
# 36 0.7755
# 39 0.8059
```

Iteratively adjust *α* to control the Type I Error (Labes, Schütz). Slight heteroscedasticity (*CV _{wT}* 0.30,

```
scABEL.ad(CV = c(0.30, 0.35), design = "2x2x4", n = 30)
# +++++++++++ scaled (widened) ABEL ++++++++++++
# iteratively adjusted alpha
# (simulations based on ANOVA evaluation)
# ----------------------------------------------
# Study design: 2x2x4 (4 period full replicate)
# log-transformed data (multiplicative model)
# 1,000,000 studies in each iteration simulated.
#
# CVwR 0.35, CVwT 0.3, n(i) 15|15 (N 30)
# Nominal alpha : 0.05
# True ratio : 0.9000
# Regulatory settings : EMA (ABEL)
# Switching CVwR : 0.3
# Regulatory constant : 0.76
# Expanded limits : 0.7723 ... 1.2948
# Upper scaling cap : CVwR > 0.5
# PE constraints : 0.8000 ... 1.2500
# Empiric TIE for alpha 0.0500 : 0.06651
# Power for theta0 0.9000 : 0.814
# Iteratively adjusted alpha : 0.03540
# Empiric TIE for adjusted alpha: 0.05000
# Power for theta0 0.9000 : 0.771
```

With the nominal *α* 0.05 the Type I Error will be inflated (0.0665). With the adjusted *α* 0.0354 (*i.e.*, the 92.92% confidence interval) the TIE will be controlled, although with a slight loss in power (decreases from 0.814 to 0.771).

Consider `sampleN.scABEL.ad(CV = c(0.30, 0.35), design = "2x2x4")`

to estimate the sample size which both controls the TIE and maintains the target power. In this example 34 subjects will be required.

Sample size for a four period full replicate “2x2x4” study (any of TRTR|RTRT, TRRT|RTTR, TTRR|RRTT) assuming heteroscedasticity (*CV _{wT}* 0.40,

```
sampleN.RSABE(CV = c(0.40, 0.50), design = "2x2x4", details = FALSE)
#
# ++++++++ Reference scaled ABE crit. +++++++++
# Sample size estimation
# ---------------------------------------------
# Study design: 2x2x4 (4 period full replicate)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.4; CVw(R) = 0.5
# True ratio = 0.9
# ABE limits / PE constraints = 0.8 ... 1.25
# Regulatory settings: FDA
#
# Sample size
# n power
# 20 0.81509
```

Sample size assuming heteroscedasticity (*CV _{w}* 0.125, σ

```
CV <- signif(CVp2CV(CV = 0.125, ratio = 2.5), 4)
n <- sampleN.NTIDFDA(CV = CV)[["Sample size"]]
#
# +++++++++++ FDA method for NTIDs ++++++++++++
# Sample size estimation
# ---------------------------------------------
# Study design: 2x2x4 (TRTR|RTRT)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.1497, CVw(R) = 0.09433
# True ratio = 0.975
# ABE limits = 0.8 ... 1.25
# Implied scABEL = 0.9056 ... 1.1043
# Regulatory settings: FDA
# - Regulatory const. = 1.053605
# - 'CVcap' = 0.2142
#
# Sample size search
# n power
# 28 0.665530
# 30 0.701440
# 32 0.734240
# 34 0.764500
# 36 0.792880
# 38 0.816080
suppressMessages(power.NTIDFDA(CV = CV, n = n, details = TRUE))
# p(BE) p(BE-sABEc) p(BE-ABE) p(BE-sratio)
# 0.81608 0.93848 1.00000 0.85794
```

The *s _{wT}*/

Compare that with homoscedasticity (

```
CV <- 0.125
n <- sampleN.NTIDFDA(CV = CV, details = FALSE)[["Sample size"]]
#
# +++++++++++ FDA method for NTIDs ++++++++++++
# Sample size estimation
# ---------------------------------------------
# Study design: 2x2x4 (TRTR|RTRT)
# log-transformed data (multiplicative model)
# 1e+05 studies for each step simulated.
#
# alpha = 0.05, target power = 0.8
# CVw(T) = 0.125, CVw(R) = 0.125
# True ratio = 0.975
# ABE limits = 0.8 ... 1.25
# Regulatory settings: FDA
#
# Sample size
# n power
# 16 0.822780
suppressMessages(power.NTIDFDA(CV = CV, n = n, details = TRUE))
# p(BE) p(BE-sABEc) p(BE-ABE) p(BE-sratio)
# 0.82278 0.84869 1.00000 0.95128
```

Here the scaled ABE component shows the lowest power and drives the sample size, which is much lower than in the previous example.

*CV* 0.20, Doses 1, 2, and 8 units, *β*_{0} 1, target power 0.90.

```
sampleN.dp(CV = 0.20, doses = c(1, 2, 8), beta0 = 1, targetpower = 0.90)
#
# ++++ Dose proportionality study, power model ++++
# Sample size estimation
# -------------------------------------------------
# Study design: crossover (3x3 Latin square)
# alpha = 0.05, target power = 0.9
# Equivalence margins of R(dnm) = 0.8 ... 1.25
# Doses = 1 2 8
# True slope = 1, CV = 0.2
# Slope acceptance range = 0.89269 ... 1.1073
#
# Sample size (total)
# n power
# 18 0.915574
```

Note that the acceptance range of the slope depends on the ratio of the highest and lowest doses (*i.e.*, it gets tighter for wider dose ranges and therefore, higher sample sizes will be required).

In an exploratory setting wider equivalence margins {*θ*_{1}, *θ*_{2}} (0.50, 2.00) are recommended, which would translate in this example to an acceptance range of `0.66667 ... 1.3333`

and a sample size of only six subjects.

Explore impact of deviations from assumptions (higher *CV*, higher deviation of *θ*_{0} from 1, dropouts) on power. Assumed intra-subject *CV* 0.20, target power 0.90. Suppress the plot.

```
res <- pa.ABE(CV = 0.20, targetpower = 0.90)
print(res, plotit = FALSE)
# Sample size plan ABE
# Design alpha CV theta0 theta1 theta2 Sample size Achieved power
# 2x2 0.05 0.2 0.95 0.8 1.25 26 0.9176333
#
# Power analysis
# CV, theta0 and number of subjects which lead to min. acceptable power of at least 0.7:
# CV= 0.2729, theta0= 0.9044
# n = 16 (power= 0.7354)
```

If the study starts with 26 subjects (power ~0.92), the *CV* can increase to ~0.27 **or** *θ*_{0} decrease to ~0.90 **or** the sample size decrease to 10 whilst power will still be ≥0.70.

However, this is **not** a substitute for the “Sensitivity Analysis” recommended in ICH-E9, since in a real study a combination of all effects occurs simultaneously. It is up to *you* to decide on reasonable combinations and analyze their respective power.

Performed on a Xeon E3-1245v3 3.4 GHz, 8 MB cache, 16 GB RAM, R 4.0.2 64 bit on Windows 7.

“2x2” crossover design, *CV* 0.17. Sample sizes and achieved power for the supported methods (the 1^{st} one is the default).

```
# method n power seconds
# owenq 14 0.805683 0.0015
# mvt 14 0.805690 0.1220
# noncentral 14 0.805683 0.0010
# shifted 16 0.852301 0.0005
```

The 2^{nd} exact method is substantially slower than the 1^{st}. The approximation based on the noncentral *t*-distribution is slightly faster but matches the 1^{st} exact method closely. The approximation based on the shifted central *t*-distribution is the fastest but *might* estimate a sample size higher than necessary. Hence, it should be used only for comparative purposes.

Four period full replicate study, homogenicity (*CV _{wT}* =

```
# method n power seconds
# ‘key’ statistics 28 0.81116 0.16
# subject simulations 28 0.81196 2.32
```

Simulating via the ‘key’ statistics is the method of choice for speed reasons.

However, subject simulations are recommended *if*

- the partial replicate design (TRR|RTR|RRT) is planned
*and* - the special case of heterogenicity
*CV*>_{wT}*CV*is expected._{wR}

You can install the released version of PowerTOST from CRAN with

```
package <- "PowerTOST"
inst <- package %in% installed.packages()
if (length(package[!inst]) > 0) install.packages(package[!inst])
```

… and the development version from GitHub with

```
# install.packages("remotes")
remotes::install_github("Detlew/PowerTOST")
```

Skips installation from a github remote if the SHA-1 has not changed since last install. Use `force = TRUE`

to force installation.