EffTox in trialr

Kristian Brock

2017-11-20

Recap of the EffTox model

Thall & Cook (2004) introduced the EffTox design for dose-finding clinical trials where both efficacy and toxicity events guide dose selection decisions. This is in contrast to methods like 3+3 and CRM (O’Quigley, Pepe, and Fisher 1990), where dose selection is determined by toxicity events only. We provide a brief recap of EffTox here, but full details are given in (Thall and Cook 2004; Thall, Cook, and Estey 2006; Thall et al. 2014).

For doses \(\boldsymbol{y} = (y_1, ..., y_n)\), the authors define codified doses \((x_1, ..., x_n)\) using the transform

\(x_i = \log{y_i} - \sum_{j=1}^n \frac{\log{y_j}}{n}\)

The codified doses are used as the sole explanatory variables in logit models for the marginal probabilities of toxicity and efficacy:

\(\text{logit } \pi_T = \alpha + \beta x\)

\(\text{logit } \pi_E = \gamma + \zeta x + \eta x^2\)

and the joint probability is modelled using the function

\(\pi_{a,b} = (\pi_E)^a (1-\pi_E)^{1-a} (\pi_T)^b (1-\pi_T)^{1-b} + (-1)^{a+b} (\pi_E) (1-\pi_E) (\pi_T) (1-\pi_T) \frac{e^\psi-1}{e^\psi+1}\).

Normal priors are specified for the elements of the parameter vector \(\boldsymbol{\theta} = (\alpha, \beta, \gamma, \zeta, \eta, \psi)\).

At each dose update decision, the dose \(x\) is acceptable if

\(\text{Pr}\left\{ \pi_T(x, \boldsymbol{\theta}) < \overline{\pi}_T | \mathcal{D} \right\} > p_T\)

and

\(\text{Pr}\left\{ \pi_E(x, \boldsymbol{\theta}) > \underline{\pi}_E | \mathcal{D} \right\} > p_E\)

and is no more than than one position below the lowest dose-level given and no more than one position above the highest dose-level given. The net effect of these last two criteria is that untried doses may not be skipped in escalation or de-escalation. \(\underline{\pi}_E, p_E, \overline{\pi}_T, p_T\) are provided by the user as the trial scenario dictates.

The utility of dose \(x\), with efficacy \(\pi_E(x, \boldsymbol{\theta})\) and toxicity \(\pi_T(x, \boldsymbol{\theta})\) is

\(u(\pi_E, \pi_T) = 1 - \left( \left(\frac{1-\pi_E}{1-\pi_{1,E}^*}\right)^p + \left(\frac{\pi_T}{\pi_{2,T}^*}\right)^p \right) ^ \frac{1}{p}\)

where \(p\) is calculated to intersect the points \((\pi_{1,E}^*, 0)\), \((1, \pi_{2,T}^*)\) and \((\pi_{3,E}^*, \pi_{3,T}^*)\) in the efficacy-toxicity plain. I refer to these as hinge points but that is not common nomenclature.

At the dose selection decision, the dose-level from the acceptable set with maximal utility is selected to be given to the next patient or cohort. If there are no acceptable doses, the trial stops and no dose is recommended.

There are several published EffTox examples, including explanations and tips on parameter choices (Thall and Cook 2004, Thall, Cook, and Estey (2006), Thall et al. (2014)).

The MD Anderson Cancer Center publishes software (R. Herrick et al. 2015) to perform calculations and simulations for EffTox trials. However, the software is available for Windows in compiled-form only. Thus, trialists cannot run the software on Mac or Linux unless using a virtual machine. Furthermore, trialists may not easily alter the behaviour of the model. Brock et al. (submitted 2017) describe a clinical trial scenario where some alteration of the default model behaviour would have been preferable. It was this that prompted the author to write the open-source implementation provided in ‘trialr’.

EffTox in trialr

We will work with the advanced prostate cancer example given in Thall et al. (2014). They investigate the five doses 1, 2, 4, 6.6 and 10 mcL/kg, seeking the best dose with

\(\text{Pr}\left\{ \pi_T(x, \boldsymbol{\theta}) < 0.3 | \mathcal{D} \right\} > 0.1\)

and

\(\text{Pr}\left\{ \pi_E(x, \boldsymbol{\theta}) > 0.5 | \mathcal{D} \right\} > 0.1\)

Thus, we have \(p_T = p_E = 0.1, \overline{\pi}_T = 0.3\) and \(\underline{\pi}_E = 0.5\).

The parameterisation for this trial is loaded by default in the MD Anderson EffTox app. It can be loaded in trialr using the command

library(trialr)
dat <- efftox_parameters_demo()

Their hinge points on the neutral utility curve are (0.5, 0), (1, 0.65) and (0.7, 0.25). The curvature parameter for the utility contours is calculated using

p <- efftox_solve_p(eff0 = 0.5, tox1 = 0.65, eff_star = 0.7, tox_star = 0.25)
p
## [1] 0.9773632

The utility curves in this particular example have curvature parameter \(p \approx 0.977\).

We fetched the dat object above by calling efftox_parameters_demo. That function is provided for convenience. It is just shorthand for

dat <- list(
  num_doses = 5,
  real_doses = c(1, 2, 4, 6.6, 10),
  efficacy_hurdle = 0.5,
  toxicity_hurdle = 0.3,
  p_e = 0.1,
  p_t = 0.1,
  p = p,
  eff0 = 0.5,
  tox1 = 0.65,
  eff_star = 0.7,
  tox_star = 0.25,
  
  alpha_mean = -7.9593, alpha_sd = 3.5487,
  beta_mean = 1.5482, beta_sd = 3.5018,
  gamma_mean = 0.7367, gamma_sd = 2.5423,
  zeta_mean = 3.4181, zeta_sd = 2.4406,
  eta_mean = 0, eta_sd = 0.2,
  psi_mean = 0, psi_sd = 1,
  
  eff = c(),
  tox = c(),
  doses = c(),
  num_patients = 0
)

Refer to the published paper for information on the normal priors.

To study the method, let us say that we have treated six patients in two cohorts of three at dose-levels 1 and 2 respectively. Let us also say we observe no toxicities and one efficacy event in cohort 1, and one toxicity and two efficacies in cohort 2. We record this information in the dat object that is eventually passed to rstan to perform posterior sampling.

dat$doses = c(1, 1, 1, 2, 2, 2)
dat$tox   = c(0, 0, 0, 0, 1, 0)
dat$eff   = c(0, 1, 0, 0, 1, 1)
dat$num_patients = 6

This completes the information required by EffTox to calculate the next dose. We invoke the prior-to-posterior analysis by calling the rstan::sampling function with the pre-compiled EffTox stanmodel1 in trialr and our data list. We set the seed for reproducibility.

samp <- rstan::sampling(stanmodels$EffTox, data = dat, seed = 123)
## trying deprecated constructor; please alert package maintainer

The call to rstan:sampling prints logging information to the standard out stream. It has been suppressed here for brevity.

x <- efftox_process(dat, samp)

The call to efftox_process extracts the most pertinent information from the sampling object.

x$recommended_dose
## [1] 3

We see that dose-level 3 is recommended for the next cohort.

knitr::kable(efftox_analysis_to_df(x), digits = 3)
ProbEff and ProbTox are the probabilities of efficacy and toxicity at each dose. ProbAccEff is the probability that the efficacy rate exceeds the desired threshold and ProbAccTox the probability that the toxicity rate is less than the threshold. All probabilities are posterior means.
DoseLevel ProbEff ProbTox ProbAccEff ProbAccTox Utility Acceptable
1 0.306 0.088 0.178 0.926 -0.533 TRUE
2 0.630 0.098 0.758 0.928 0.100 TRUE
3 0.834 0.212 0.920 0.734 0.331 TRUE
4 0.887 0.301 0.936 0.629 0.301 FALSE
5 0.907 0.359 0.938 0.572 0.251 FALSE

This confirms that dose-level 3 has maximal utility. We see that dose-levels 4 and 5 are unacceptable. This is because dose-level 3 has not yet been given. No doses are inferred to be too toxic or inefficacious yet.

We can produce contour plots.

efftox_contour_plot(dat, prob_eff = x$prob_eff, prob_tox = x$prob_tox)
title('EffTox utility contours')
Utility contours after observing outcomes 1NEN 2NBE.

Utility contours after observing outcomes 1NEN 2NBE.

The blue stars show the location of the hinge points on the neutral-uility (u=0) contour. The red numbers show the posterior means of the five dose-levels. Doses that are closer to the lower-right corner have higher utility. We see that dose-level 3 has the highest utility.

We can also produce posterior density plots of the dose utilities. For illustration, we will just plot the densities of the three acceptable doses. The package ggplot2 is required.

efftox_utility_density_plot(samp, doses = 1:3) +
  ggplot2::ggtitle("EffTox dose utility densities")
Utility densities after observing outcomes 1NEN 2NBE.

Utility densities after observing outcomes 1NEN 2NBE.

To further facilitate the analysis of dose utility, we provide means of calculating the dose superiority matrix.

knitr::kable(efftox_superiority(samp), digits = 2, row.names = TRUE)
1 2 3 4 5
1 NA 0.91 0.86 0.83 0.80
2 0.09 NA 0.76 0.69 0.64
3 0.14 0.24 NA 0.57 0.54
4 0.17 0.31 0.43 NA 0.51
5 0.20 0.36 0.46 0.49 NA

The element in row \(i\) and column \(j\) shows \(\text{Prob(}u_j > u_i | \text{data)}\), where \(u_k\) is the utility of dose \(k\). We can be quite confident that dose 3 has higher utility than doses 1 and 2. In contrast, the model is much more vague about which is the superior of doses 3, 4 and 5. More data may be illuminating.

Simulation

We also provide a way to simulate EffTox trial scenarios. Simulation allows trialists to assess the performance of their design.

In addition to the parameters already described, the user must provide the true probabilities of efficacy and toxicity at each dose, and a vector of desired cohort sizes. For illustration, we simulate Scenario 1 under contour \(C_2\) in Table 1 of Thall et al. (2014). They use the EffTox parameterisation described above to investigate performance under true efficacy probabilities (0.2, 0.4, 0.6, 0.8, 0.9) and toxicity probabilities (0.05, 0.1, 0.15, 0.2, 0.4).

The preferable dose is calculated after the evaluation of each cohort and the following cohort is treated at the dose recommended. We provide the desired cohort sizes as a vector of integers via the cohort_sizes parameter. In the Thall example, they treat a maximum of 39 patients in thirteen cohorts of 3, thus we use cohort_sizes = rep(3, 13).

In the trialr EffTox implementation, the cohort sizes need not be constant. We can specify different cohort sizes, an option not possible in the official EffTox software. For instance, the investigators might want to re-evaluate the ideal dose after every patient in the early trial stages, where information is scarce. However, once a certain number of patient outcomes have been observed, they might prefer to revert to cohorts of three to avoid unnecessarily frequent analyses. To analyse the dose after each patient for the first 9 patients, followed by ten cohorts of three, we could use cohort_sizes = c(rep, 1, 9), rep(3, 10)).

Once again, we need a list of data to pass to the RStan sampler.

dat <- list(
  num_doses = 5,
  real_doses = c(1, 2, 4, 6.6, 10),
  efficacy_hurdle = 0.5,
  toxicity_hurdle = 0.3,
  p_e = 0.1,
  p_t = 0.1,
  p = p,
  eff0 = 0.5,
  tox1 = 0.65,
  eff_star = 0.7,
  tox_star = 0.25,

  alpha_mean = -7.9593, alpha_sd = 3.5487,
  beta_mean = 1.5482, beta_sd = 3.5018,
  gamma_mean = 0.7367, gamma_sd = 2.5423,
  zeta_mean = 3.4181, zeta_sd = 2.4406,
  eta_mean = 0, eta_sd = 0.2,
  psi_mean = 0, psi_sd = 1,

  doses = c(),
  tox   = c(),
  eff   = c(),
  num_patients = 0
)

The elements in dat reflect the point from which each simulated trial iteration will commence. Note that num_patients, doses, tox and eff convey that no patients have yet been observed. This is not a constraint. Unlike the official EffTox software, users may simulate the commencement of a partially-observed EffTox trial by tailoring these four items. To simulate trials starting from a blank canvas, these four elemnts should be set as shown above.

The following code will run a simulation.

set.seed(123)
sims = efftox_simulate(dat, num_sims = 100, first_dose = 1, 
                       true_eff = c(0.20, 0.40, 0.60, 0.80, 0.90),
                       true_tox = c(0.05, 0.10, 0.15, 0.20, 0.40),
                       cohort_sizes = rep(3, 13))

The doses recommended at the end of each simulated trial are recorded in the recommended_dose slot of the sims object. For instance, infer from the simulated trials the probability that each dose will be recommended using

table(sims$recommended_dose) / length(sims$recommended_dose)

Similarly, you can calculate the probability of each dose being given to a random patient using

table(unlist(sims$doses_given)) / length(unlist(sims$doses_given))

and the mean number of patients being treated at each dose-level in each simulation

table(unlist(sims$doses_given)) / length(sims$recommended_dose)

Simulation speed

The call to efftox_simulate will take about 25 minutes to run 100 iterations. This is much slower than the official Windows EffTox app, which would typically take a number of seconds rather than minutes. The official app is written in C++ and uses a fast numerical method for resolving Bayesian integrals with Gaussian priors called spherical radial integration, presented by Monahan & Genz (1997). In contrast, trialr delegates posterior sampling to rstan. The time it takes to run simulated EffTox trials in trialr is largely driven by the time it takes to run rstan::sampling many times. We are observing that MCMC is generally slower than spherical radial integration. However, it is also much more flexible. The trialr implementation of EffTox is easy to customise. Any aspect of the probability model may be changed by altering our open-source implementation. Non-normal priors may be used without any obligation to re-implement the prior-to-posterior analysis method. RStan handles the general model specification, rather than optimising for particular circumstances.

The process of simulating dose-finding trials is relatively computationally intensive because the dose update decision is made at the end of each cohort. In the above scenario, the dose decision is performed up to 13 times per iteration. When calling rstan::sampling, 4 chains of 2000 draws are used by default. Thus, 100 iterations of the above scenario involves \(100 \times 13 \times 4 = 5200\) chains sampled from the posterior distribution. Thus, if each call takes a fraction of a second (a sampled chain takes approximately 0.3s on my computer), the aggregate run-time is of the order of 20-30 minutes. Users can alter the number of chains used, the number of points per chain, and the amount of thinning by providing extra parameters via the ellipsis operator to efftox_simulate. These will then be forwarded to the rstan::sampling function.

Simulation is a costly exercise! Striking the balance between speed and flexibility is difficult. Sometimes a slow, flexibile method will be preferable to a fast, fixed method. It pays to hone parameters on small exploratory batches and commit to large jobs when ready.

trialr

trialr is available at https://github.com/brockk/trialr

References

Herrick, R, C Norris, JD Cook, and J Venier. 2015. “EffTox.” MD Anderson Cancer Center. https://biostatistics.mdanderson.org/softwaredownload/SingleSoftware.aspx?Software{\_}Id=2.

Monahan, John, and Alan Genz. 1997. “Spherical-Radial Integration Rules for Bayesian Computation.” Journal of the American Statistical Association 92 (438): 664–74. http://www.jstor.org/stable/2965714.

O’Quigley, J, M Pepe, and L Fisher. 1990. “Continual reassessment method: a practical design for phase 1 clinical trials in cancer.” Biometrics 46 (1): 33–48. doi:10.2307/2531628.

Thall, PF, and JD Cook. 2004. “Dose-Finding Based on Efficacy-Toxicity Trade-Offs.” Biometrics 60 (3): 684–93.

Thall, PF, JD Cook, and EH Estey. 2006. “Adaptive dose selection using efficacy-toxicity trade-offs: illustrations and practical considerations.” Journal of Biopharmaceutical Statistics 16 (5): 623–38. doi:10.1080/10543400600860394.

Thall, PF, RC Herrick, HQ Nguyen, JJ Venier, and JC Norris. 2014. “Effective sample size for computing prior hyperparameters in Bayesian phase I-II dose-finding.” Clinical Trials 11 (6): 657–66. doi:10.1177/1740774514547397.


  1. The advice by the RStan team is that authors of packages that use RStan should provide static model files that are compiled when the package is installed.