Kristian Brock


Introducing the BEBOP design

The BEBOP phase II trial methodology incorporates predictive baseline information to study co-primary efficacy and toxicity outcomes. BEBOP stands for Bayesian Evaluation of Bivariate Binary Outcomes and Predictive Information. It was developed for the PePS2 trial, investigating pembrolizumab in non-small-cell lung cancer (NSCLC) patients with performance status 2 (PS2). A major factor in the PePS2 trial is the data on PS0 / 1 NSCLC patients published by Garon et al. (2015), showing that patients with greater PD-L1 tumour proportion scores are more likely to achieve an objective response. They introduce this predictive biomarker and validate a three-level categorisation Low, Medium and High PD-L1 score. There was also a suggestion (albeit not shown to be statistically significant) that previously untreated patients are more likely to have a response. These same variables will likely be predictive of response in the PS2 population. Brock et al. (publication in submission) developed BEBOP to make efficient use of this information.

BEBOP re-uses the probability model at the core of the EffTox design (Thall and Cook 2004, Thall et al. (2014)). Let \(x\) represent a vector of predictive baseline variables and \(\boldsymbol{\theta}\) a vector of parameters. The marginal probabilities of efficacy and toxicity are estimated using general functions \(\text{logit } \pi_E(x, \boldsymbol{\theta})\) and \(\text{logit } \pi_T(x, \boldsymbol{\theta})\) to be specified by the user.

In PePS2, we have \(x_{i1} = 1\) if a patient has been previously treated. To convey PD-L1 expression, the authors use \(x_{i2} = 1\) and \(x_{i3} = 0\) if a patient has a Low PD-L1 score; \(x_{i2} = 0\) and \(x_{i3} = 1\) if a patient has a Medium PD-L1 score; and \(x_{i2} = 0\) and \(x_{i3} = 0\) if a patient has a High PD-L1 score. Thus, we have \(x_i = (x_{i1}, x_{i2}, x_{i3})\). The marginal efficacy and toxicity functions in PePS2 take the form

\(\text{logit } \pi_E(x, \boldsymbol{\theta}) = \alpha + \beta x_{i1} + \gamma x_{i2} + \zeta x_{i3}\)

\(\text{logit } \pi_T(x, \boldsymbol{\theta}) = \lambda\)

Efficacy and toxicity events are then modelled using the joint probability function

\(\pi_{a,b} = (\pi_E)^a (1-\pi_E)^{1-a} (\pi_T)^b (1-\pi_T)^{1-b} + (-1)^{a+b} (\pi_E) (1-\pi_E) (\pi_T) (1-\pi_T) \frac{e^\psi-1}{e^\psi+1}\).

where \(a,b\) are dummy variables taking the value 1 if a patient experienced efficacy and toxicity respectively, else 0.

The complete vector of parameters is \(\boldsymbol{\theta} = (\alpha, \beta, \gamma, \zeta, \lambda, \psi)\). Normal priors are specified for the elements of \(\boldsymbol{\theta}\).

The treatment is acceptable for patients with predictive vector \(x\) if

\(\text{Pr}\left\{ \pi_E(x, \boldsymbol{\theta}) > \underline{\pi}_E | \mathcal{D} \right\} > p_E\)


\(\text{Pr}\left\{ \pi_T(x, \boldsymbol{\theta}) < \overline{\pi}_T | \mathcal{D} \right\} > p_T\)

where \(\underline{\pi}_E, \overline{\pi}_T, p_E, p_T\) are chosen for clinical relevance.

PePS2 is an all-comers trial, thus patients are admitted regardless of their PD-L1 or pre-treatment status. This is motivated by the dearth of treatment options for PS2 NSCLC patients who cannot use chemotherapy. The design allows the predictive information to effectively stratify the analysis without stratifying recruitment. The statistical design uses the common Bayesian tool of borrowing strength across groups to improve the performance of the analysis.

PePS2 Scenario in trialr

The cohorts in the PePS2 trial are

i Pretreated PDL1 x1 x2 x3
1 FALSE Low 0 1 0
2 FALSE Medium 0 0 1
3 FALSE High 0 0 0
4 TRUE Low 1 1 0
5 TRUE Medium 1 0 1
6 TRUE High 1 0 0

The trial uses a sample size of 60. Let us simulate a set of outcomes with the following efficacy and toxicity rates

peps2_sc <- function() peps2_get_data(num_patients = 60,
                                      prob_eff = c(0.167, 0.192, 0.5, 0.091, 0.156, 0.439),
                                      prob_tox = rep(0.1, 6),
                                      eff_tox_or = rep(1, 6))

dat <- peps2_sc()

In this example, we have used efficacy rates that increase in PD-L1 and are slightly higher in previously-uintreated patients. We use the uniform toxicity rate of 10% across all cohorts, and no association between efficacy and toxicity events, represented by odds-ratios equal to 1.

The dat object contains, for example, the prior parameters

c(dat$alpha_mean, dat$alpha_sd)
## [1] -2.2  2.0

and simulated predictive variables and efficacy and toxicity outcomes

  head(with(dat, data.frame(eff, tox, x1, x2, x3)), 10)
eff tox x1 x2 x3
0 0 0 1 0
0 0 0 1 0
0 0 0 1 0
0 0 0 1 0
0 1 0 0 1
0 0 0 0 1
0 0 0 0 1
1 0 0 0 1
0 0 0 0 1
0 0 0 0 1

We fit the data to the BEBOP model and obtain samples from the posterior distribution using rstan. The BebopInPeps2 model is provided by trialr and compiled when the package is installed.

samp <- rstan::sampling(stanmodels$BebopInPeps2, data = dat)
## trying deprecated constructor; please alert package maintainer

It is informative to view plots of posterior beliefs. Posterior samples of parameter values are available but they are less meaningful to us than the modelled efficacy rates, for example. View posterior distributions using code like

rstan::plot(samp, pars = 'prob_eff')
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
Posterior Prob(Efficacy) in the six PePS2 cohorts

Posterior Prob(Efficacy) in the six PePS2 cohorts

We see that the modelled rates of efficacy are highest in cohorts 3 and 6, but likely to be greater than the critical threshold of 10% in cohort 2 and perhaps cohort 5 as well. In contrast, there is not much evidence of clinical benefit in cohorts 1 and 4. This is confirmed by invoking the formally described analysis and associated decision rules.

decision <- peps2_process(dat, samp)
  with(decision, data.frame(ProbEff, ProbAccEff, ProbTox, ProbAccTox, Accept)), 
  digits = 3
ProbEff ProbAccEff ProbTox ProbAccTox Accept
0.074 0.254 0.069 1 FALSE
0.252 0.984 0.069 1 TRUE
0.428 0.988 0.069 1 TRUE
0.040 0.094 0.069 1 FALSE
0.151 0.745 0.069 1 TRUE
0.284 0.952 0.069 1 TRUE

ProbAccEff is the posterior probability that the efficacy rate in a cohort is greater than the 10% threshold. ProbAccTox is the probability that toxicity is less than 30%. The treatment is acceptable in a cohort if it is sufficiently efficacious and non-toxic. We see that in this simulated iteration, the treatment would be approved in all cohorts except 1 and 4.

We can perform the simulations on a greater number of iterations to learn about the operating characteristics of the design.

sims <- peps2_run_sims(num_sims = 10, sample_data_func = peps2_sc,
                       summarise_func = peps2_process)

In peps2_run_sims, the second and third args are delegates to simulate trial outcomes and post-process the rstan sample respectively. The outcome samping delelgate is called without arguments. The post-process delegate is called with first argument the object returned by the outcome sampling delegate (e.g. dat above), and second argument the posterior sample from rstan (e.g. samp above). The objects returned by the post-process delegate form the items in the sims object that are returned to the user by peps2_run_sims.

Thus, the probaility of approving the treatment using the statistical design in this scenario can be calculated using

apply(sapply(sims, function(x) x$Accept), 1, mean)


trialr is available at


Garon, Edward B, Naiyer a Rizvi, Rina Hui, Natasha Leighl, Ani S Balmanoukian, Joseph Paul Eder, Amita Patnaik, et al. 2015. “Pembrolizumab for the treatment of non-small-cell lung cancer.” The New England Journal of Medicine 372 (21): 2018–28. doi:10.1056/NEJMoa1501824.

Thall, PF, and JD Cook. 2004. “Dose-Finding Based on Efficacy-Toxicity Trade-Offs.” Biometrics 60 (3): 684–93.

Thall, PF, RC Herrick, HQ Nguyen, JJ Venier, and JC Norris. 2014. “Effective sample size for computing prior hyperparameters in Bayesian phase I-II dose-finding.” Clinical Trials 11 (6): 657–66. doi:10.1177/1740774514547397.