# replicateBE

### Comparative BA-calculation for the EMA’s Average Bioequivalence with Expanding Limits (ABEL)

#### 2019-11-11

library(replicateBE) # attach the library

Version 1.0.12 built 2019-11-11 with R 3.6.1.

# Introduction

The library provides data sets (internal .rda and in CSV-format in /extdata/) which support users in a black-box performance qualification (PQ) of their software installations.1 Users can analyse own data imported from CSV- and Excel-files.

The methods given by the European Medicines Agency (EMA) in Annex I2 for reference-scaling according to the EMA’s Guideline on the Investigation of Bioequivalence3 are implemented. Potential influence of outliers on the variability of the reference can be assessed by box plots of studentized and standardized residuals as suggested at a joint EGA/EMA symposium.4

# Functions

## method.A()

A linear model of log-transformed PK responses and effects
sequence, subject(sequence), period, treatment
where all effects are fixed (i.e., ANOVA). Estimated via function lm() of library stats.

## method.B()

A linear model of log-transformed PK responses and effects
sequence, subject(sequence), period, treatment
where subject(sequence) is a random effect and all others are fixed.

Three options are provided

1. Estimated via function lmer() of library lmerTest.

Employs Satterthwaite’s approximation5 of the degrees of freedom method.B(..., option = 1) equivalent to SAS’ DDFM=SATTERTHWAITE and Phoenix WinNonlin’s DF Satterthwaite. Note that this is the only available approximation in SPSS.

1. Estimated via function lme() of library nlme.

Employs degrees of freedom equivalent to SAS’ DDFM=CONTAIN and Phoenix WinNonlin’s DF Residual. Implicitly preferred according to the EMA’s Q&A document. Hence, method.B(..., option = 2) is the default (i.e., if the argument option is missing).

1. Estimated via function lmer() of library lmerTest.

Employs the Kenward-Roger approximation6 of the degrees of freedom method.B(..., option = 3) equivalent to SAS’ DDFM=KENWARDROGER. Note that this is the only available approximation in JMP.

## ABE()

Conventional Average Bioequivalence, where the model is identical to Method A. Tighter limits for narrow therapeutic index drugs (EMA 90.00 – 111.11%) or wider limits (75.00 – 133.33% for Cmax according to the guideline of the GCC) can be specified by the arguments theta1 (lower limit) and/or theta2 (upper limit).

# Tested designs

Details about the reference data sets and their designs:

help("data", package = "replicateBE")
?replicateBE::data

## Four period (full) replicates

Both the test and the reference treatments are administered at least once.

TRTR | RTRT
TRRT | RTTR
TTRR | RRTT

### Four sequences

TRTR | RTRT | TRRT | RTTR
TRRT | RTTR | TTRR | RRTT

Although supported, these design are not recommended due to confounded effects.

## Three period (full) replicates

The test treatment is administered at least once to ½ of the subjects and the reference treatment at least once to the respective other ½ of the subjects.

TRT | RTR
TRR | RTT

## Two period (full) replicate

The test and reference treatments are administered once to ½ of the subjects (for the estimation of the CI), i.e., the first group of subjects follows a conventional 2×2×2 trial. In the second group the test and reference treatments are administered at least once to ¼ of the subjects, repectively (for the estimation of CVwT and CVwR).

TR | RT | TT | RR

Although supported, Balaam’s design7 is not recommended due to its poor power characteristics.

## Three period (partial) replicates

The test treatment is administered once and the reference treatment at least once.

TRR | RTR | RRT
TRR | RTR

The latter is the so-called extra-reference design8 which is not recommended since it is biased in the presence of period effects.

# Data structure

Columns must have the headers subject, period, sequence, treatment, PK, and/or logPK.9 Any order of columns is acceptable. Uppercase and mixed case headers will be internally converted to lowercase headers.

## Format

Variable Format
subject Integer numbers or any combination of alphanumerics (A-Z, a-z, -, _, #, 0-9)
period Integer numbers
sequence Numbers or literal sequences not listed in the tested designs are not accepted (e.g., ABAB).
treatment The Test treatment must be coded T and the Reference R (both uppercase).
PK Real positive numbers of PK responses.
logPK Real numbers of already loge-transformed PK responses (optional and rarely needed).

Relevant data are used for the estimation of CVwR (and CVwT in full replicate designs) and BE, i.e., the data sets might be different (see the example below). It is good practice to state that in the Statistical Analysis Plan (SAP).

## Incomplete data

### Estimation of CVw

If a subject drops out from the study in a higher period, data of repeated administrations will still be used for the estimation of CVw, although data of the other treatment might be missing. Examples for the estimation of CVwR (missings denoted by .):

RTRT | RTR.
TRRT | TRR.
RRTT | RRT. or RR..
RRT | RR..

### Assessment of BE

If a subject drops out from the study in a higher period, data with at least one administration of the Test and Reference will be used in the assessment of BE. Examples (missings denoted by .):

TRTR | TRT. or | TR..
RTRT | RTR. or | RT..
TRRT | TRR. or | TR..
RTTR | RTT. or | RT..
TTRR | TTR.
TRT | TR.
RTR | RT.
TRR | TR.
RTR | RT.
RTT | RT.

### Example of different data sets

16 subjects enrolled in the study. In sequence RTRT one dropout in the 2nd period and one in the 4th period. In sequence TRTR one dropout in the 3rd period and one in the 4th.

1 RTR.  5 RTRT  9 TRTR 13 RTRT
2 RTRT  6 TR.. 10 TRTR 14 TRT.
3 RTRT  7 RTRT 11 RTRT 15 TRTR
4 TRTR  8 R... 12 TRTR 16 TRTR

We obtain these data sets:

Purpose included excluded
Estimation of CVwR 13 who received 2 treatments R 6, 8, 14
Estimation of CVwT 12 who received 2 treatments T 1, 6, 8
Assessment of BE 15 who received ≥1 treatment T and ≥1 treatment R 8

# Notes on the methods

## Estimation of intra-subject variability

The EMA proposed a linear model of log-transformed PK responses of the reference treatment
sequence, subject(sequence), period
where all effects are fixed. Estimated via function lm() of library stats:

B <- method.B(print = FALSE, details = TRUE, data = eval(parse(text = ds[i])))$BE r <- paste0("A ", A, ", B ", B, " \u2013 ") cat(paste0(ds[i], ":"), r) if (A == B) { cat("Methods agree.\n") } else { if (A == "fail" & B == "pass") { cat("Method A is conservative.\n") } else { cat("Method B is conservative.\n") } } } # rds01: A pass, B pass – Methods agree. # rds02: A pass, B pass – Methods agree. # rds03: A pass, B pass – Methods agree. # rds04: A fail, B fail – Methods agree. # rds05: A pass, B pass – Methods agree. # rds06: A pass, B pass – Methods agree. # rds07: A pass, B pass – Methods agree. # rds08: A pass, B pass – Methods agree. # rds09: A pass, B pass – Methods agree. # rds10: A pass, B pass – Methods agree. # rds11: A pass, B pass – Methods agree. # rds12: A fail, B fail – Methods agree. # rds13: A fail, B fail – Methods agree. # rds14: A pass, B fail – Method B is conservative. # rds15: A fail, B fail – Methods agree. # rds16: A fail, B fail – Methods agree. # rds17: A fail, B fail – Methods agree. # rds18: A fail, B fail – Methods agree. # rds19: A fail, B fail – Methods agree. # rds20: A fail, B fail – Methods agree. # rds21: A fail, B fail – Methods agree. # rds22: A pass, B pass – Methods agree. # rds23: A pass, B pass – Methods agree. # rds24: A pass, B pass – Methods agree. # rds25: A pass, B pass – Methods agree. # rds26: A fail, B fail – Methods agree. # rds27: A pass, B pass – Methods agree. # rds28: A pass, B pass – Methods agree. Exploring data set 14: All variants of ‘Method B’ are more conservative than ‘Method A’. Before rounding the confidence interval, option = 2 with 192 degrees of freedom would be more conservative (lower CL 69.21029) than option = 1 with 197.44 degrees of freedom (lower CL 69.21286). Given the incompleteness of this data set (four missings in period 2, twelve in period 3, and 19 in period 4), Satterthwaite’s or Kenward-Roger degrees of freedom are probably the better choice. ## Outlier analysis It is an open issue how outliers should be handled. The applicant should justify that the calculated intra-subject variability is a reliable estimate and that it is not the result of outliers. — BE-Guideline Box plots were ‘suggested’ by the author as a mere joke [sic] at the EGA/EMA symposium, being aware of their non­para­metric nature and the EMA’s reluctance towards robust methods. Alas, this joke was included in the Q&A document. […] a study could be acceptable if the bioequivalence requirements are met both including the outlier subject (using the scaled average bioequivalence approach and the within-subject CV with this subject) and after exclusion of the outlier (using the within-subject CV without this subject). An outlier test is not an expectation of the medicines agencies but outliers could be shown by a box plot. This would allow the medicines agencies to compare the data between them. — EGA/EMA Q&A-document With the additional argument ola = TRUE in method.A() and method.B() an outlier analysis is performed, where the default fence = 2.14 Results differ slightly depending on software’s algorithms to calculate the median and quartiles. Example with the ‘types’ implemented in R (note the differences even in the medians): • Box plots of studentized15 and standarized16 model residuals are constructed.17 • Potential outliers are flagged based on the argument fence provided by the user. • With the additional argument verbose = TRUE detailed information is shown in the console. Example for the reference data set 01: Outlier analysis (externally) studentized residuals Limits (2×IQR whiskers): -1.717435, 1.877877 Outliers: subject sequence stud.res 45 RTRT -6.656940 52 RTRT 3.453122 standarized (internally studentized) residuals Limits (2×IQR whiskers): -1.69433, 1.845333 Outliers: subject sequence stand.res 45 RTRT -5.246293 52 RTRT 3.214663 If based on studentized residuals outliers are detected, additionally to the expanded limits based on the complete reference data, tighter limits are calculated based on CVwR after exclusion of outliers and BE assessed with the new limits. Standardized residuals are shown for informational purposes only and are not used for exclusion of outliers. Output for the reference data set 01 (re-ordered for clarity): CVwR : 46.96% (reference-scaling applicable) swR : 0.44645 Expanded limits : 71.23% ... 140.40% [100exp(±0.760·swR)] Assessment based on original CVwR 46.96% ──────────────────────────────────────── Confidence interval: 107.11% ... 124.89% pass Point estimate : 115.66% pass Mixed (CI & PE) : pass ╟────────┼─────────────────────┼───────■────────◊─────────■───────────────╢ Outlier fence : 2×IQR of studentized residuals. Recalculation due to presence of 2 outliers (subj. 45|52) ───────────────────────────────────────────────────────── CVwR (outl. excl.) : 32.16% (reference-scaling applicable) swR (recalculated) : 0.31374 Expanded limits : 78.79% ... 126.93% [100exp(±0.760·swR)] Assessment based on recalculated CVwR 32.16% ──────────────────────────────────────────── Confidence interval: pass Point estimate : pass Mixed (CI & PE) : pass ╟┼─────────────────────┼───────■────────◊─────────■─╢ Note that the PE and its CI are not affected since the entire data are used and therefore, these values not reported in the second analysis (only the conclusion of the assessement). The ‘line plot’ is given for informational purposes since its resolution is only ~0.5%. The filled squares are the lower and upper 90% confidence limits, the rhombus the point estimate, the vertical lines at 100% and the PE restriction (80.00 – 125.00%), and the double vertical lines the expanded limits. The PE and CI take pre­se­dence over other symbols. In this case the upper limit of the PE restriction is not visible. Since both analyses arrive at the same conclusion, the study should be acceptable according to the Q&A document. # Applicability, caveats, outlook The EMA’s approach of reference-scaling for highly variable drugs / drug products is currently recommended in other jurisdictions as well (e.g., the WHO; ASEAN States, Australia, Brazil, the East African Community, Egypt, the Eurasian Economic Union, New Zealand, the Russian Federation). The estimated CVwR is always uncertain (the degree of uncertainty depends on the CVwR itself, the design, and the sample size), which might lead to an inflation of the type I error (i.e., if ABEL is falsely applied although the true – but unknown – CVwR is lower than its estimate).18, 19 Use the optional argument method.A(..., adjust = TRUE) to iteratively adjust α to control the type I error.20 If you want to apply the most conservative approach of Molins et al.21 (which corrects for CVwR 30% instead of the observed one), get the data.frame of results with x <- method.A(..., details = TRUE, print = FALSE). Adjust α in library PowerTOST and call method.A() again: design <- "2x2x4" # your design n <- as.integer(strsplit(x[[6]], "|", fixed = TRUE)[[1]]) # subjects / sequence y <- PowerTOST::scABEL.ad(CV = 0.3, n = n, design = design, print = FALSE) method.A(..., alpha = y$alpha.adj)

The WHO accepts reference-scaling for AUC (four period full replicate studies are mandatory in order to assess the vari­ability associated with each product). It is not evident how this assessment should be done./ In Population Bioequivalence (PBE) and Individual Bioequivalence (IBE) the swT/swR ratio was assessed and ‘similar’ variability was concluded for a ratio within 0.667 – 1.500. However, the power of comparing variabilities in a study designed to demonstrate ABE is low. This was one of the reasons why PBE and IBE were not implemented in regulatory practice. An alternative approach is given in the FDA’s guidance on warfarin where variabilities are considered ‘comparable’ if the upper confidence limit of σwT/σwR is ≤2.5.

# Cross-validation

Results of all reference data sets agree with ones obtained in SAS (v9.4), Phoenix WinNonlin (v6.4 – v8.1), STATISTICA (v13), SPSS (v22.0), Stata (v15.0), and JMP (v10.0.2).

# Contributors

• Helmut Schütz (Author)
• Michael Tomashevskiy (Contributor)
• Detlew Labes (Contributor)

# Disclaimer

Program offered for Use without any Guarantees and Absolutely No Warranty.
No Liability is accepted for any Loss and Risk to Public Health Resulting from Use of this R-Code.

1. Schütz H, Tomashevskiy M, Labes D, Shitova A, González-de la Parra M, Fuglsang A. Reference Data­sets for Studies in a Replicate Design intended for Average Bioequivalence with Expanding Limits. Manu­script in pre­paration 2019.

2. European Medicines Agency. Annex I. London, 21 September 2016. EMA/582648/2016.

3. European Medicines Agency, Committee for Medicinal Products for Human Use. Guideline on the Investigation of Bioequivalence. London, 20 January 2010. CPMP/EWP/QWP/1401/98 Rev. 1/Corr **.

4. European Generic Medicines Association. Revised EMA Bioequivalence Guideline. 3rd EGA Symposium on Bioequivalence. London, 1 June 2010. Questions & Answers.

5. Satterthwaite FE. An Approximate Distribution of Estimates of Variance Components. Biometrics Bulletin. 1946; 2(6): 110–4. doi:10.2307/3002019.

6. Kenward MG, Roger JH. Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood. Biometrics. 1997; 53(3): 983–97. doi:10.2307/2533558.

7. Balaam LN. A Two-Period Design with t2 Experimental Units. Biometrics. 1968; 24(1): 61–73. doi:10.2307/2528460.

8. Chow, SC, Shao J, Wang H. Individual bioequivalence testing under 2×3 designs. Stat Med. 2002; 21(5): 629–48. doi:10.1002/sim.1056.

9. Napierian logarithm (base e). The decadic logarithm (base 10) is not supported).

10. European Medicines Agency. Questions & Answers: positions on specific questions addressed to the Pharmacokinetics Working Party (PKWP). London, June 2015 (and later revisons). EMA/618604/2008.

11. Contradics the law of simplicity. Such a nesting is superfluous since in BE trials subjects are uniquely coded. If, say, subject 1 is allocated to sequence TRTR there is not yet ‘another’ subject 1 allocated to sequence RTRT. This explains the many lines in SAS PROC GML given with . and in Phoenix WinNonlin as not estimable.

12. European Medicines Agency, Committee for Medicinal Products for Human Use. Guideline on the pharmacokinetic and clinical evaluation of modified release dosage forms. London, 20 November 2014. EMA/CHMP/EWP/280/96

13. Of course, only if the point estimates are identical.

14. The fences are given by the lowest datum still within m×IQR of the lower quartile, and the highest datum still within m×IQR of the upper quartile, where IQR is the interquartile range (difference between the 3rd and 1st quartiles). Data outside fences are considered outliers. Decreasing the multiplier m to e.g., 1.5 might result in many outliers, whereas increasing the multiplier in only a few.
Different methods exist to calculate quartiles (nine ‘types’ are available in R, where the default is type = 7). R’s default is used by S, MATLAB, Octave, and Excel. Phoenix WinNonlin, Minitab, and SPSS use type = 6, ScyPy uses type = 4, whereas the default in SAS and Stata is type = 2 (though others are available as well).

15. Externally studentized: $$\widehat{\sigma}_{(i)}^2={1 \over n-m-1}\sum_{\begin{smallmatrix}j = 1\\j \ne i\end{smallmatrix}}^n \widehat{\varepsilon\,}_j^{\,2}$$

16. Internally studentized: $$\widehat{\sigma}^2={1 \over n-m}\sum_{j=1}^n \widehat{\varepsilon\,}_j^{\,2}$$

17. Both are available in SAS and R, whereas only the latter in e.g., Phoenix WinNonlin. In general the former are slightly more restrictive. Which one will be used has to be stated in the SAP.

18. Wonnemann M, Frömke C, Koch A. Inflation of the Type I Error: Investigations on Regulatory Recommendations for Bioequivalence of Highly Variable Drugs. Pharm Res. 2015; 32(1): 135–43. doi:10.1007/s11095-014-1450-z.

19. Muñoz J, Alcaide D, Ocaña J. Consumer’s risk in the EMA and FDA regulatory approaches for bioequivalence in highly variable drugs. Stat Med. 2016; 35(12): 1933–43. doi:10.1002/sim.6834.

20. Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016; 33(11): 2805–14. doi:10.1007/s11095-016-2006-1.

21. Molins E, Cobo E, Ocaña J. Two-Stage Designs Versus European Scaled Average Designs in Bioequivalence Studies for Highly Variable Drugs: Which to Choose? Stat Med. 2017: 36(30); 4777–88. doi:10.1002/sim.7452.