Benchmark Testing

Kendal Foster and Henrik Singmann

November 04, 2020

Function dfddm evaluates the density function (or probability density function, PDF) for the Ratcliff diffusion decision model (DDM) using different methods for approximating the full PDF, which contains an infinite sum. An overview of the mathematical details of the different approximations is provided in the Math Vignette. An empirical validation of the implemented methods is provided in the Validity Vignette. Two examples of using dfddm for parameter estimation are provided in the Example Vignette.

Our implementation of the DDM has the following parameters: \(a \in (0, \infty)\) (threshold separation), \(v \in (-\infty, \infty)\) (drift rate), \(t_0 \in [0, \infty)\) (non-decision time/response time constant), \(w \in (0, 1)\) (relative starting point), and \(sv \in (0, \infty)\) (inter-trial-variability of drift).

Introduction


Since the DDM is widely used in parameter estimation usually involving numerical optimization, significant effort has been put into making the evaluation of its density as fast as possible. However, the density function for the DDM is notorious for containing an unavoidable infinite sum; hence, the literature has produced a few different methods of approximating the density. This vignette is designed to compare the speed of the currently available algorithms that calculate the lower probability density of the DDM. To this aim, we present benchmark data of currently available methods from the literature, our streamlined implementations of those methods, and our own novel methods. The first section will record the benchmark data of the density function approximations themselves and show the results through a series of visualizations. The second section will record the benchmark data of parameter estimation that uses the density function approximations in the optimization process for fitting to real-world data; this section will also include visualizations to illustrate the differences among the density function approximations. Please note that we will not be testing the functions for accuracy or consistency in this vignette, as that is covered in the Validity Vignette.

Benchmarking the Density Functions


We want to determine the performance of each method across a wide variety of parameters in order to identify any slow areas for individual methods. To achieve this rigor, we define an extensive parameter space (in a code chunk below) and loop through each combination of parameters. For the preliminary benchmark testing, we will input the response times as a vector to each function instead of inputting individual response times for two reasons: first, this is the most common way to input the response time data; and second, this allows for the R-based method to exploit vectorization to make that method as fast as possible. For each combination of parameters, we run the microbenchmark function from the package microbenchmark 100 times for each method and only save the median benchmark time of these 100. We will refer to these as the “median benchmark times” for the remainder of this vignette. Running the benchmark tests this many times for each set of parameters generates a sufficient amount of benchmark data so that the median of these times is unlikely to be an outlier, either faster or slower than a typical run time.

Upon collecting the benchmark data, we perform a preliminary analysis of the approximations’ performance by showing the distributions of median benchmark times as side-by-side violin plots for each approximation. Then we cull the methods and only test further on the fastest and most widely used approximations. To elucidate the areas of the parameter space where the different methods excel and struggle, we plot the median benchmark times as a function of individual parameter values. These plots will reveal which model parameters affect the efficiency of each approximation and how those parameters can slow down the approximations.

Generating Benchmark Data

Before analyzing how the different density function approximations perform, we must first generate the benchmark data. In addition to testing all of the methods available in dfddm, we also include three functions that are considered to be current and widely used. First, we include the function ddiffusion from the package rtdists as it is well known for being not only user friendly and feature-rich but also designed specifically for handling data with regard to distributions for response time models. Second, we also test the dwiener function from the RWiener package, which is mainly aimed at providing an R language interface to calculate various functions from the Wiener process. Third, we include some raw R code that is bundled with the paper written by Gondan, Blurton, and Kesselmeier (2014) about improving the approximation to the density function. As this code was not yet available in an R package, it is part of fddm and can be accessed after running source on the corresponding file as shown below.

To capture the behavior of the density function approximations, we test each one in a granular parameter space that includes the parameter values most likely used in practice. This parameter space can be found fully defined in the code chunk later in this section. Note that we are only testing the “lower” density function as the conversion from the “lower” density function to the “upper” density function involves only the mappings \(v \to -v\) and \(w \to 1-w\). Since our parameter space includes the negatives of all positive values of \(v\) and the complements of all values of \(w\), testing both the “lower” and “upper” density functions would be redundant.

In addition, we include two functions for benchmarking the performance of the approximations: one where the response times are input as a vector, and one where the response times are input one at a time to the density function. Inputting the response times as a vector is the most common way of inputting data to the density function and allows the R code from Gondan, Blurton, and Kesselmeier (2014) to exploit R’s powerful vectorization, operating at its most efficient. We also include a benchmark function that inputs the response times individually as this allows us to see the impact that the size of the response time has on the evaluation time of the density function approximations.

At each point in this parameter space, we run the microbenchmark function 100 times for each approximation and save the median of these 100 benchmark times. Using the median of these data should reduce the number of outliers in the data while maintaining a truthful representation of the approximations’ performance. The following code chunk defines the functions that we will use to benchmark the algorithms; we will be using the microbenchmark package to time the evaluations.

library("fddm")
library("rtdists")
library("RWiener")
source(system.file("extdata", "Gondan_et_al_density.R", package = "fddm", mustWork = TRUE))
library("microbenchmark")

rt_benchmark_vec <- function(RT, resp, V, A, t0 = 1e-4, W = 0.5, SV = 0.0,
                             err_tol = 1e-6, times = 100, unit = "ns") {

  fnames <- c("fs_SWSE_17", "fs_SWSE_14", "fb_SWSE_17", "fb_SWSE_14",
              "fs_Gon_17", "fs_Gon_14", "fb_Gon_17", "fb_Gon_14",
              "fs_Nav_17", "fs_Nav_14", "fb_Nav_17", "fb_Nav_14",
              "fl_Nav_09", "RWiener", "Gondan", "rtdists")
  nf <- length(fnames) # number of functions being benchmarked
  nV <- length(V)
  nA <- length(A)
  nW <- length(W)
  nSV <- length(SV)
  resp <- rep(resp, length(RT)) # for RWiener

  # Initialize the dataframe to contain the microbenchmark results
  mbm_res <- data.frame(matrix(ncol = 4+nf, nrow = nV*nA*nW*nSV))
  colnames(mbm_res) <- c('V', 'A', 'W', 'SV', fnames)
  row_idx <- 1

  # Loop through each combination of parameters and record microbenchmark results
  for (v in 1:nV) {
    for (a in 1:nA) {
      for (w in 1:nW) {
        for (sv in 1:nSV) {
          mbm <- microbenchmark(
          fs_SWSE_17 = dfddm(rt = RT, response = resp, a = A[a],
                             v = V[v], t0 = t0, w = W[w],
                             log = FALSE, n_terms_small = "SWSE",
                             summation_small = "2017", scale = "small",
                             err_tol = err_tol),
          fs_SWSE_14 = dfddm(rt = RT, response = resp, a = A[a],
                             v = V[v], t0 = t0, w = W[w],
                             log = FALSE, n_terms_small = "SWSE",
                             summation_small = "2014", scale = "small",
                             err_tol = err_tol),
          fb_SWSE_17 = dfddm(rt = RT, response = resp, a = A[a],
                             v = V[v], t0 = t0, w = W[w],
                             log = FALSE, n_terms_small = "SWSE",
                             summation_small = "2017", scale = "both",
                             err_tol = err_tol),
          fb_SWSE_14 = dfddm(rt = RT, response = resp, a = A[a],
                             v = V[v], t0 = t0, w = W[w],
                             log = FALSE, n_terms_small = "SWSE",
                             summation_small = "2014", scale = "both",
                             err_tol = err_tol),
          fs_Gon_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2017", scale = "small",
                            err_tol = err_tol),
          fs_Gon_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2014", scale = "small",
                            err_tol = err_tol),
          fb_Gon_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2017", scale = "both",
                            err_tol = err_tol),
          fb_Gon_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Gondan",
                            summation_small = "2014", scale = "both",
                            err_tol = err_tol),
          fs_Nav_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2017", scale = "small",
                            err_tol = err_tol),
          fs_Nav_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2014", scale = "small",
                            err_tol = err_tol),
          fb_Nav_17 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2017", scale = "both",
                            err_tol = err_tol),
          fb_Nav_14 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            summation_small = "2014", scale = "both",
                            err_tol = err_tol),
          fl_Nav_09 = dfddm(rt = RT, response = resp, a = A[a],
                            v = V[v], t0 = t0, w = W[w],
                            log = FALSE, n_terms_small = "Navarro",
                            scale = "large", err_tol = err_tol),
          RWiener = dwiener(RT, resp = resp, alpha = A[a],
                            delta = V[v], tau = t0, beta = W[w],
                            give_log = FALSE),
          Gondan = fs(t = RT-t0, a = A[a], v = V[v],
                      w = W[w], eps = err_tol), # only "lower" resp
          rtdists = ddiffusion(RT, resp, a = A[a], v = V[v],
                               t0 = t0, z = W[w]*A[a]),
          times = times, unit = unit)
        # add the v, a, and w values to the dataframe
        mbm_res[row_idx, 1] <- V[v]
        mbm_res[row_idx, 2] <- A[a]
        mbm_res[row_idx, 3] <- W[w]
        mbm_res[row_idx, 4] <- SV[sv]
        # add the median microbenchmark results to the dataframe
        for (i in 1:nf) {
          mbm_res[row_idx, 4+i] <- median(mbm[mbm[,1] == fnames[i],2])
        }
        # iterate start value
        row_idx = row_idx + 1
        }
      }
    }
  }
  return(mbm_res)
}

rt_benchmark_ind <- function(RT, resp, V, A, t0 = 1e-4, W = 0.5, SV = 0.0,
                             err_tol = 1e-6, times = 100, unit = "ns") {
  fnames <- c("fs_SWSE_17", "fs_SWSE_14", "fb_SWSE_17", "fb_SWSE_14",
              "fs_Gon_17", "fs_Gon_14", "fb_Gon_17", "fb_Gon_14",
              "fs_Nav_17", "fs_Nav_14", "fb_Nav_17", "fb_Nav_14",
              "fl_Nav_09", "RWiener", "Gondan", "rtdists")
  nf <- length(fnames) # number of functions being benchmarked
  nRT <- length(RT)
  nV <- length(V)
  nA <- length(A)
  nW <- length(W)
  nSV <- length(SV)

  # Initialize the dataframe to contain the microbenchmark results
  mbm_res <- data.frame(matrix(ncol = 5+nf, nrow = nRT*nV*nA*nW*nSV))
  colnames(mbm_res) <- c('RT', 'V', 'A', 'W', 'SV', fnames)
  row_idx <- 1

  # Loop through each combination of parameters and record microbenchmark results
  for (rt in 1:nRT) {
    for (v in 1:nV) {
      for (a in 1:nA) {
        for (w in 1:nW) {
          for (sv in 1:nSV) {
            mbm <- microbenchmark(
            fs_SWSE_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "SWSE",
                              summation_small = "2017", scale = "small",
                              err_tol = err_tol),
            fs_SWSE_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "SWSE",
                              summation_small = "2014", scale = "small",
                              err_tol = err_tol),
            fb_SWSE_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "SWSE",
                              summation_small = "2017", scale = "both",
                              err_tol = err_tol),
            fb_SWSE_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "SWSE",
                              summation_small = "2014", scale = "both",
                              err_tol = err_tol),
            fs_Gon_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2017", scale = "small",
                              err_tol = err_tol),
            fs_Gon_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2014", scale = "small",
                              err_tol = err_tol),
            fb_Gon_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2017", scale = "both",
                              err_tol = err_tol),
            fb_Gon_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Gondan",
                              summation_small = "2014", scale = "both",
                              err_tol = err_tol),
            fs_Nav_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2017", scale = "small",
                              err_tol = err_tol),
            fs_Nav_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2014", scale = "small",
                              err_tol = err_tol),
            fb_Nav_17 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2017", scale = "both",
                              err_tol = err_tol),
            fb_Nav_14 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              summation_small = "2014", scale = "both",
                              err_tol = err_tol),
            fl_Nav_09 = dfddm(rt = RT[rt], response = resp, a = A[a],
                              v = V[v], t0 = t0, w = W[w],
                              log = FALSE, n_terms_small = "Navarro",
                              scale = "large", err_tol = err_tol),
            RWiener = dwiener(RT[rt], resp = resp, alpha = A[a],
                              delta = V[v], tau = t0, beta = W[w],
                              give_log = FALSE),
            Gondan = fs(t = RT[rt]-t0, a = A[a], v = V[v],
                        w = W[w], eps = err_tol), # only "lower" resp
            rtdists = ddiffusion(RT[rt], resp, a = A[a], v = V[v],
                                 t0 = t0, z = W[w]*A[a]),
            times = times, unit = unit)
          # add the v, a, and w values to the dataframe
          mbm_res[row_idx, 1] <- RT[rt]
          mbm_res[row_idx, 2] <- V[v]
          mbm_res[row_idx, 3] <- A[a]
          mbm_res[row_idx, 4] <- W[w]
          mbm_res[row_idx, 5] <- SV[sv]
          # add the median microbenchmark results to the dataframe
          for (i in 1:nf) {
            mbm_res[row_idx, 5+i] <- median(mbm[mbm[,1] == fnames[i],2])
          }
          # iterate start value
          row_idx = row_idx + 1
          }
        }
      }
    }
  }
  return(mbm_res)
}

Control loops in R have a lot of overhead and are comparatively slow compared to loops in C++; thus, we will not run rt_benchmark_vec() in this vignette, and we will load pre-run benchmark data instead. The pre-run data was collected using the same functions and parameter space as defined in this vignette. Since this benchmark function contains one fewer loop because of the vectorized response times, we increase the number of samples that the ‘microbenchmark’ function considers when timing the various algorithms. To show comparisons across the algorithms with respect to the input response time, we will again load pre-run benchmark data obtained through using rt_benchmark_ind() as this takes even longer to evaluate than the benchmark with vectorized response times. The following code chunk defines the parameter space to be explored throughout the benchmark testing and shows an example of how to run the benchmark functions defined in the previous code chunk.

# Define parameter space
RT <- c(0.001, 0.1, 1, 2, 3, 4, 5, 10, 30)
A <- c(0.25, 0.5, 1, 2.5, 5)
V <- c(-5, -2, 0, 2, 5)
t0 <- 1e-4 # must be nonzero for RWiener
W <- c(0.2, 0.5, 0.8)
SV <- c(0, 0.5, 1, 1.5)
err_tol <- 1e-6 # this is the setting from rtdists

# Run benchmark tests
bm_vec <- rt_benchmark_vec(RT = RT, resp = "lower", V = V, A = A, t0 = t0,
                           W = W, SV = SV, err_tol = err_tol,
                           times = 1000, unit = "ns")
bm_ind <- rt_benchmark_ind(RT = RT, resp = "lower", V = V, A = A, t0 = t0,
                           W = W, SV = SV, err_tol = err_tol,
                           times = 100, unit = "ns")

Analysis of Benchmark Results

The first step in analyzing the benchmark data is a rough visualization of the results in side-by-side violin plots that show the distribution of microbenchmark timings. We use the data that we just generated in the previous code chunk to plot a density violin for each method. Each density violin is overlaid with a boxplot showing the median and the first and third quartiles, in addition to a dashed line representing the mean. We also constrain the vertical axis of the plot to focus on the area where most of the fddm methods lie; depending on your system, this will likely hide the results from rtdists as they are typically much higher than that of fddm.

For this visualization we will use the benchmark data where the response times were handed over as a vector to each fddm call. As discussed above, running the benchmark on individual response times is considerably slower as the loop over the response times resides in R and not in C++.

library("reshape2")
library("ggplot2")

# load data, will be in the variable 'bm_vec'
load(system.file("extdata", "bm_vec.Rds", package = "fddm", mustWork = TRUE))

t_idx <- match("SV", colnames(bm_vec))
bm_vec[, -seq_len(t_idx)] <- bm_vec[, -seq_len(t_idx)]/1000 # convert to microseconds
mbm_vec <- melt(bm_vec, measure.vars = -seq_len(t_idx),
                variable.name = "FuncName", value.name = "time")

Names_vec <- c("fb_SWSE_17", "fb_SWSE_14", "fb_Gon_17", "fb_Gon_14",
               "fb_Nav_17", "fb_Nav_14", "fs_SWSE_17", "fs_SWSE_14",
               "fs_Gon_17", "fs_Gon_14", "fs_Nav_17", "fs_Nav_14",
               "fl_Nav_09", "RWiener", "Gondan", "rtdists")
Color_vec <- c("#e000b4", "#ff99eb", "#e68a00", "#ffb366",
               "#006699", "#66ccff", "#9900cc", "#cc99ff",
               "#c2a500", "#d7db42", "#336600", "#33cc33",
               "#996633", "#ff9999", "#ff5050", "#990000")

mi <- min(bm_vec[, -seq_len(t_idx)])
ma <- max(bm_vec[, (t_idx+1):(ncol(bm_vec)-4)])

ggplot(mbm_vec, aes(x = factor(FuncName, levels = Names_vec), y = time,
                    color = factor(FuncName, levels = Names_vec),
                    fill = factor(FuncName, levels = Names_vec))) +
       geom_violin(trim = TRUE, alpha = 0.5) +
       scale_color_manual(values = Color_vec, guide = FALSE) +
       scale_fill_manual(values = Color_vec, guide = FALSE) +
       geom_boxplot(width = 0.15, fill = "white", alpha = 0.5) +
       stat_summary(fun = mean, geom = "errorbar",
                    aes(ymax = ..y.., ymin = ..y..),
                    width = .35, linetype = "dashed") +
       coord_cartesian(ylim = c(mi, ma)) +
       labs(title = "Distribution of median benchmark times",
            subtitle = "Dashed lines represent mean benchmark times",
            x = "Method", y = "Time (microseconds)",
            color = "Method") +
       theme_bw() +
       theme(panel.border = element_blank(),
             plot.title = element_text(size = 23),
             plot.subtitle = element_text(size = 16),
             axis.text.x = element_text(size = 16, angle = 90,
                                        vjust = 0.5, hjust = 1),
             axis.text.y = element_text(size = 16),
             axis.title.x = element_text(size = 20),
             axis.title.y = element_text(size = 20),
             legend.position = "none")

Depending on your system, the results might differ from the ones shown here, but they should replicate the general pattern. The small-time methods (“fs_”) have long tails due to very few outliers, but their central tendencies are rather fast. The methods combining small-time and large-time (“fb_”) show an almost normal distribution with only a short tail; they also have the fastest central tendencies. Of the fddm methods, the large-time approach (“fl_”) is the slowest with a long tail. The function from the Rwiener package combines small-time and large-time approach and also shows an almost normal distribution, but its distribution of median benchmark times is shifted higher compared to those of fddm. The other two methods are clearly slower; the function provided by Gondan, Blurton, and Kesselmeier (2014) is purely in R so it is constrained by the speed of R’s vectorization. Similarly constrained by R, the function from the rtdists package performs quite a few computations in R that make it very user-friendly but at the cost of having the slowest central tendencies.

To visualize the problem areas for each method, we plot that method’s median benchmark times as a function of each individual parameter value; this will allow us to see where in the parameter space each method is efficient and where it is inefficient. In each of the following plots, we will vary one model parameter and show the mean of the median benchmark times, the 10% quantiles and 90% quantiles of the median benchmark times, as well as the minimum and maximum median benchmark times. For each value of that parameter, we aggregate the benchmark data for all values of the other parameter to generate the statistics plotted. For example, in a plot illustrating the effect of varying \(v\), the statistics plotted for the parameter value \(v = 0\) will include the benchmark data for the full ranges of \(rt\), \(a\), \(w\), and \(sv\). Note that because we also vary the response time in these plots, all of the following plots show the results based on the individual response times (i.e., rt_benchmark_ind). As this benchmark function records the evaluation time of approximating the density for only one response time in contrast to a vector of response times, we expect the typical benchmark times produced by rt_benchmark_ind to be faster than that of rt_benchmark_vec.

To avoid too many repetitious plots, we will only include six of the methods from the previous plot that illustrate the different variations and implementations of the density functions. We include: the fastest small-time implementation in dfddm (“fs_SWSE_17”), the one large-time implementation of the large-time in dfddm (“fl_Nav_09”), the fastest implementation in dfddm that combines the small-time and large-time (“fb_SWSE_17”), and the three methods from R packages and the literature as described above.

The following code chunk loads the pre-run benchmark data and prepares it for the subsequent plots that will be generated by the later code chunks.

# load data, will be in the variable 'bm_ind'
load(system.file("extdata", "bm_ind.Rds", package = "fddm", mustWork = TRUE))
bm_ind[["RTAA"]] <- bm_ind[["RT"]] / bm_ind[["A"]] / bm_ind[["A"]]
bm_ind <- bm_ind[, c(1, 2, ncol(bm_ind), 3:(ncol(bm_ind)-1)) ]

t_idx <- match("SV", colnames(bm_ind))
bm_ind[,-seq_len(t_idx)] <- bm_ind[, -seq_len(t_idx)]/1000 # convert to microseconds
mbm_ind <- melt(bm_ind, measure.vars = -seq_len(t_idx),
                variable.name = "FuncName", value.name = "time")

Names_meq <- c("fb_SWSE_17", "fs_SWSE_14", "fl_Nav_09",
               "RWiener", "Gondan", "rtdists")
Color_meq <- c("#e000b4", "#cc99ff", "#996633",
               "#ff9999", "#ff5050", "#990000")
mbm_meq <- subset(mbm_ind, FuncName %in% Names_meq)

Effective Response Time

We first investigate how the response times affect the performance of each method. When calculating the density, the main culprit of longer evaluation times is the approximation of the infinite sum. Since the response time, \(rt\), is always scaled by the threshold separation, \(a\), in the infinite sum, we combine these model parameters and instead examine the “effective response time”, defined as \(\frac{rt}{a^2}\). This code chunk will generate a plot showing how varying the “effective response time” can affect the median benchmark time of the approximations.

ggplot(mbm_meq, aes(x = RTAA, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_x_log10() +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = paste(
         "The shaded regions represent the 10% and 90% quantiles",
         "The lighter shaded regions represent the min and max times",
         sep = ";\n"),
       x = bquote(frac(rt, a^2) ~ ", effective response time, " ~ log[10]),
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 15, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

This plot confirms that the aptly named small-time and large-time density functions are generally more efficient at calculating the density for small response times and large response times, respectively. Of the three methods used from the literature, the RWiener and rtdists functions use a switching mechanism between the small-time and large-time density functions, and the algorithm from Gondan, Blurton, and Kesselmeier (2014) only uses the small-time density function. Importantly, the methods that utilize a switching mechanism between the small-time and large-time density functions have much flatter plots and generally perform well across the \(\frac{rt}{a^2}\) parameter space.

Relative Starting Point

Similarly to the previous plot, this code chunk plots the effect on the median benchmark time; but this time we will instead vary the model parameter \(w\), the relative starting point.

ggplot(mbm_meq, aes(x = W, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = paste(
         "The shaded regions represent the 10% and 90% quantiles",
         "The lighter shaded regions represent the min and max times",
         sep = ";\n"),
       x = "w, relative starting point (a priori bias)",
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 15, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

The relative starting point \(w\) has very little impact on the infinite sum, and thus we see that the means and quantiles are relatively flat across the range of \(w\). However, we see a rather clear dip around 0.5 for the maximum times of many methods.

Drift Rate

Here we plot the effects of varying the parameter \(v\), the drift rate, on the median benchmark time.

ggplot(mbm_meq, aes(x = V, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = paste(
         "The shaded regions represent the 10% and 90% quantiles",
         "The lighter shaded regions represent the min and max times",
         sep = ";\n"),
       x = "v, drift rate",
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 15, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

The drift rate \(v\) has nothing to do with the infinite sum, which is why we see overall flat means across the range of \(v\). The only pattern we see is a small bump for \(v = 0\) of the small-time method.

Inter-trial Variablity in the Drift Rate

Lastly we plot the effects of varying \(sv\), the inter-trial variability in the drift rate, on the median benchmark time.

ggplot(mbm_meq, aes(x = SV, y = time,
                    color = factor(FuncName, levels = Names_meq),
                    fill = factor(FuncName, levels = Names_meq))) +
  stat_summary(fun.min = min, fun.max = max,
               geom = "ribbon", color = NA, alpha = 0.1) +
  stat_summary(fun.min = function(z) { quantile(z, 0.1) },
               fun.max = function(z) { quantile(z, 0.9) },
               geom = "ribbon", color = NA, alpha = 0.2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_manual(values = Color_meq) +
  scale_fill_manual(values = Color_meq) +
  labs(title = "Means of the median microbenchmark results",
       subtitle = paste(
         "The shaded regions represent the 10% and 90% quantiles",
         "The lighter shaded regions represent the min and max times",
         sep = ";\n"),
       x = "sv, inter-trial variability in the drift rate",
       y = "Time (microseconds)") +
  theme_bw() +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        plot.title = element_text(size = 20),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 15, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        strip.text = element_text(size = 14),
        strip.background = element_rect(fill = "white"),
        legend.position = "none") +
  facet_wrap(~ factor(FuncName, levels = Names_meq), scales = "free_y")

The inter-trial variability in the drift rate, \(sv\), affects the evaluation times similarly to its related parameter \(v\). Neither one of these parameters is integral to the approximation of the infinite sum, and that insignificance is shown by the flat means across the range of \(sv\).

Overall, these plots reveal the unsurprising conclusion that the small-time density approximation generally struggled for large “effective response times” and the large-time density approximation is very inefficient for small “effective response times”. We see that the methods that combined the two density functions were noticeably more consistent across the entire range of “effective response times,” confirming that using a combination of both methods results in a very stable algorithm.

The other three figures that vary \(w\), \(v\), and \(sv\) do not reveal any meaningful patterns. Both \(v\) and \(sv\) have nothing to do with the infinite sum, and \(w\) only has a minimal role in the infinite sum; thus, we do not expect any great contributions to slowdowns in evaluation times of either the small-time or large-time density functions.

Benchmarking Model Fitting to Real-World Data


Having shown the performance of each approximation to the Ratcliff DDM density functions, we now examine how these methods compare in practice. To determine how each method behaves in a practical setting, we will use each density approximation as the basis for fitting the DDM to real-world data. The primary analysis of the methods’ performance will come from benchmarking the time taken for the parameter estimation using the different methods. In addition, a secondary analysis will compare the performance of the different methods by using the number of function evaluations used by the optimizer during the optimization process.

The parameters that we will fit are: \(a\) (threshold separation), \(v\) (drift rate), \(t_0\) (non-decision time/response time constant), \(w\) (relative starting point), and \(sv\) (inter-trial-variability of drift). Since the example data we are using consists of two different item types that each require a different correct response (i.e., for one item type the correct response is mapped to the lower boundary and for the other item type the correct response is mapped to the upper boundary), the model includes two different versions of \(v\) (drift rate): \(v_\ell\) for fitting to the truthful lower boundary, and \(v_u\) for fitting to the truthful upper boundary. As many of the approximations in dfddm have different styles, we only test a subset of all the available methods in dfddm to avoid unnecessary testing. After using the various density function approximations in fitting the DDM to real-world data, we then validate that the produced parameter estimates are consistent across the various methods within a given error tolerance.

We will not test for accuracy or consistency in this vignette because that has already been covered in the Validity Vignette and the optimization is deterministic so it will return the same result given the same inital setup.

Generating Benchmark Data for Parameter Estimation

The following subsections will define all of the functions used to generate the fittings and store the benchmark results and provide the code to run the full fitting. Since running the full fitting for all of the individuals in the data takes a long time, we will forgo running the fitting in this vignette and instead read pre-fit parameter estimates that used the provided code.

To avoid too many repetitious plots, we will only include six of the methods from the previous section that illustrate the variability in the different density function approximations. We include: the two fastest implementations in dfddm that combine the small-time and large-time (“fb_SWSE_17” and “fb_Gon_17”), the two fastest strictly small-time implementations in dfddm (“fs_SWSE_14” and “fs_Gon_17”), the one large-time implementation in dfddm (“fl_Nav_09”), as well as the one method available in the rtdists package (“rtdists”). We only use a selection of the methods available in dfddm to avoid redundancy in our testing since most of the methods have multiple sibling approximations that are extremely similar. Moreover, we do not require the RWiener package nor the Gondan raw R code because the associated density function approximations do not include an option for variability in drift rate. While we can convert the constant drift rate density to the variable drift rate density using a multiplicative factor, the densities are still potentially calculated incorrectly. For more details on the differences between the constant drift rate density functions and their variable drift rate counterparts, see the Math Vignette.

First we load the necessary packages.

library("fddm")
library("rtdists")
library("microbenchmark")

Log-Likelihood Functions

This code chunk defines the log-likelihood functions used in the optimization algorithm; all of the log-likelihood functions are the same as in the Validity Vignette. The log-likelihood functions are fairly straightforward and split the responses and associated response times by the true item status (i.e., the correct response) to enable fitting distinct drift rates (\(v_\ell\) for the items in which the correct response is the lower boundary and \(v_u\) for the items for which the correct response is the upper boundary). In addition, the log-likelihood functions heavily penalize any combination of parameters that returns a log-density of \(-\infty\) (equivalent to a regular density of \(0\)).

ll_fb_SWSE_17 <- function(pars, rt, resp, truth, err_tol) {
  v <- numeric(length(rt))
  v[truth == "upper"] <- pars[[1]]
  v[truth == "lower"] <- pars[[2]]
  dens <- dfddm(rt = rt, response = resp, a = pars[[3]], v = v,
                t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                n_terms_small = "SWSE", summation_small = "2017",
                scale = "both", err_tol = err_tol)
  return( ifelse(any(!is.finite(dens)), 1e6, -sum(dens)) )
}

ll_fb_Gon_17 <- function(pars, rt, resp, truth, err_tol) {
  v <- numeric(length(rt))
  v[truth == "upper"] <- pars[[1]]
  v[truth == "lower"] <- pars[[2]]
  dens <- dfddm(rt = rt, response = resp, a = pars[[3]], v = v,
                t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                n_terms_small = "Gondan", summation_small = "2017",
                scale = "both", err_tol = err_tol)
  return( ifelse(any(!is.finite(dens)), 1e6, -sum(dens)) )
}

ll_fs_SWSE_14 <- function(pars, rt, resp, truth, err_tol) {
  v <- numeric(length(rt))
  v[truth == "upper"] <- pars[[1]]
  v[truth == "lower"] <- pars[[2]]
  dens <- dfddm(rt = rt, response = resp, a = pars[[3]], v = v,
                t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                n_terms_small = "SWSE", summation_small = "2014",
                scale = "small", err_tol = err_tol)
  return( ifelse(any(!is.finite(dens)), 1e6, -sum(dens)) )
}

ll_fs_Gon_17 <- function(pars, rt, resp, truth, err_tol) {
  v <- numeric(length(rt))
  v[truth == "upper"] <- pars[[1]]
  v[truth == "lower"] <- pars[[2]]
  dens <- dfddm(rt = rt, response = resp, a = pars[[3]], v = v,
                t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                n_terms_small = "Gondan", summation_small = "2017",
                scale = "small", err_tol = err_tol)
  return( ifelse(any(!is.finite(dens)), 1e6, -sum(dens)) )
}

ll_fl_Nav_09 <- function(pars, rt, resp, truth, err_tol) {
  v <- numeric(length(rt))
  v[truth == "upper"] <- pars[[1]]
  v[truth == "lower"] <- pars[[2]]
  dens <- dfddm(rt = rt, response = resp, a = pars[[3]], v = v,
                t0 = pars[[4]], w = pars[[5]], sv = pars[[6]], log = TRUE,
                n_terms_small = "Navarro", scale = "large", err_tol = err_tol)
  return( ifelse(any(!is.finite(dens)), 1e6, -sum(dens)) )
}

ll_RTDists <- function(pars, rt, resp, truth) {
  rtu <- rt[truth == "upper"]
  rtl <- rt[truth == "lower"]
  respu <- resp[truth == "upper"]
  respl <- resp[truth == "lower"]

  densu <- ddiffusion(rtu, respu, a = pars[[3]], v = pars[[1]],
                      z = pars[[5]]*pars[[3]], t0 = pars[[4]], sv = pars[[6]])
  densl <- ddiffusion(rtl, respl, a = pars[[3]], v = pars[[2]],
                      z = pars[[5]]*pars[[3]], t0 = pars[[4]], sv = pars[[6]])

  densities <- c(densu, densl)
  if (any(densities <= 0)) return(1e6)
  return(-sum(log(densities)))
}

Fitting Function

We use a different set of initial parameter values for each combination of dataset and method. In a real-life situation we would probably do so using random initial parameter values to avoid local minima. Here we do this with a fixed set of initial parameter values to ensure that the different methods produce the same results for different starting points. However, there are a couple of restrictions to these initial values that we must address (we would also have to do so if we picked the initial values randomly). The parameter \(t_0\) must be less than the response time, so we set the initial values for \(t_0\) to be strictly less than the minimum response time according to each individual in the input data frame.

Furthermore, we must place a lower bound on \(a\) that is necessarily greater than zero because the optimization algorithm occasionally evaluates the log-likelihood functions (and thus the underlying density function approximations) using values of \(a\) equal to its bounds. In the case where optimization algorithm evaluates using \(a = 0\), the density function from the rtdists package does not evaluate. In common use this is not an issue because very small values of \(a\) do not make any sense with regard to the psychological interpretation of the parameter, but this issue can arise in an exploratory optimization environment.

The following code chunk defines the function that will run the optimization and produce the fitted parameter estimates for \(v_u\), \(v_\ell\), \(a\), \(t_0\), \(w\), and \(sv\). As discussed above, we only use a selection of methods from those available in dfddm. This fitting function will run the optimization for each set of initial parameter values and store the following results from the optimization: 1) resulting convergence code (either \(0\) indicating successful convergence or \(1\) indicating unsuccessful convergence); 2) minimized value of the objective log-likelihood function; 3) the number of iterations used by the optimizer during the optimization process; 4) the number of calls to the log-likelihood function during the optimization process; and 5) the median benchmark time of the optimization using each method.

As in the previous section, we run the microbenchmark function multiple times for each approximation at each point in this parameter space and save the median of these benchmark times. However, we only use a smaller number of repetitions here because a full optimization run entails several calls to the density function. Using the median of these data should reduce the number of outliers in the data while maintaining a truthful representation of the approximations’ performance. We will be using the microbenchmark package to time the evaluations.

rt_fit <- function(data, id_idx = NULL, rt_idx = NULL, response_idx = NULL,
                   truth_idx = NULL, response_upper = NULL, err_tol = 1e-6,
                   times = 100, unit = "ns") {

  # Format data for fitting
  if (all(is.null(id_idx), is.null(rt_idx), is.null(response_idx),
      is.null(truth_idx), is.null(response_upper))) {
    df <- data # assume input data is already formatted
  } else {
    if(any(data[,rt_idx] < 0)) {
      stop("Input data contains negative response times; fit will not be run.")
    }
    if(any(is.na(data[,response_idx]))) {
      stop("Input data contains invalid responses (NA); fit will not be run.")
    }

    nr <- nrow(data)
    df <- data.frame(id = character(nr),
                     rt = double(nr),
                     response = character(nr),
                     truth = character(nr),
                     stringsAsFactors = FALSE)

    if (!is.null(id_idx)) { # relabel identification tags
      for (i in 1:length(id_idx)) {
        idi <- unique(data[,id_idx[i]])
        for (j in 1:length(idi)) {
          df[["id"]][data[,id_idx[i]] == idi[j]] <- paste(
            df[["id"]][data[,id_idx[i]] == idi[j]], idi[j], sep = " ")
        }
      }
      df[["id"]] <- trimws(df[["id"]], which = "left")
    }

    df[["rt"]] <- as.double(data[,rt_idx])

    df[["response"]] <- "lower"
    df[["response"]][data[,response_idx] == response_upper] <- "upper"

    df[["truth"]] <- "lower"
    df[["truth"]][data[,truth_idx] == response_upper] <- "upper"
  }

  # Preliminaries
  ids <- unique(df[["id"]])
  nids <- max(length(ids), 1) # if inds is null, there is only one individual

  init_vals <- data.frame(v1 = c( 0,  10, -.5,  0,  0,  0,  0,  0,  0,   0,  0),
                          v0 = c( 0, -10,  .5,  0,  0,  0,  0,  0,  0,   0,  0),
                          a  = c( 1,   1,   1, .5,  5,  1,  1,  1,  1,   1,  1),
                          t0 = c( 0,   0,   0,  0,  0,  0,  0,  0,  0,   0,  0),
                          w  = c(.5,  .5,  .5, .5, .5, .5, .5, .2, .8,  .5, .5),
                          sv = c( 1,   1,   1,  1,  1,  1,  1,  1,  1, .05,  5))
  ninit_vals <- nrow(init_vals)

  algo_names <- c("fb_SWSE_17", "fb_Gon_17", "fs_SWSE_14",
                  "fs_Gon_17", "fl_Nav_09", "rtdists")
  nalgos <- length(algo_names)
  ni <- nalgos*ninit_vals

  # Initilize the result dataframe
  cnames <- c("ID", "Algorithm", "Convergence", "Objective", "Iterations",
              "FuncEvals", "BmTime")
  res <- data.frame(matrix(ncol = length(cnames), nrow = nids*ninit_vals*nalgos))
  colnames(res) <- cnames

  # label the result dataframe
  res[["ID"]] <- rep(ids, each = ni) # label individuals
  res[["Algorithm"]] <- rep(algo_names, each = ninit_vals) # label algorithms

  # Loop through each individual
  for (i in 1:nids) {
    # extract data for id i
    dfi <- df[df[["id"]] == ids[i], ]
    rti <- dfi[["rt"]]
    respi <- dfi[["response"]]
    truthi <- dfi[["truth"]]

    # starting value for t0 must be smaller than the smallest rt
    min_rti <- min(rti)
    t0_lo <- 0.01*min_rti
    t0_me <- 0.50*min_rti
    t0_hi <- 0.99*min_rti
    init_vals[["t0"]] <- c(rep(t0_me, 5), t0_lo, t0_hi, rep(t0_me, 4))

    # loop through all of the starting values
    for (j in 1:ninit_vals) {
      # get number of evaluations
      temp <- nlminb(init_vals[j, ], ll_fb_SWSE_17,
                     rt = rti, resp = respi, truth = truthi, err_tol = err_tol,
                     # limits:   vu,   vl,   a,      t0, w,  sv
                     lower = c(-Inf, -Inf, .01,       0, 0,   0),
                     upper = c( Inf,  Inf, Inf, min_rti, 1, Inf))
      res[["Convergence"]][(i-1)*ni+0*ninit_vals+j] <- temp[["convergence"]]
      res[["Objective"]][(i-1)*ni+0*ninit_vals+j] <- temp[["objective"]]
      res[["Iterations"]][(i-1)*ni+0*ninit_vals+j] <- temp[["iterations"]]
      res[["FuncEvals"]][(i-1)*ni+0*ninit_vals+j] <- temp[["evaluations"]][[1]]

      temp <- nlminb(init_vals[j, ], ll_fb_Gon_17,
                     rt = rti, resp = respi, truth = truthi, err_tol = err_tol,
                     # limits:   vu,   vl,   a,      t0, w,  sv
                     lower = c(-Inf, -Inf, .01,       0, 0,   0),
                     upper = c( Inf,  Inf, Inf, min_rti, 1, Inf))
      res[["Convergence"]][(i-1)*ni+1*ninit_vals+j] <- temp[["convergence"]]
      res[["Objective"]][(i-1)*ni+1*ninit_vals+j] <- temp[["objective"]]
      res[["Iterations"]][(i-1)*ni+1*ninit_vals+j] <- temp[["iterations"]]
      res[["FuncEvals"]][(i-1)*ni+1*ninit_vals+j] <- temp[["evaluations"]][[1]]

      temp <- nlminb(init_vals[j, ], ll_fs_SWSE_14,
                     rt = rti, resp = respi, truth = truthi, err_tol = err_tol,
                     # limits:   vu,   vl,   a,      t0, w,  sv
                     lower = c(-Inf, -Inf, .01,       0, 0,   0),
                     upper = c( Inf,  Inf, Inf, min_rti, 1, Inf))
      res[["Convergence"]][(i-1)*ni+2*ninit_vals+j] <- temp[["convergence"]]
      res[["Objective"]][(i-1)*ni+2*ninit_vals+j] <- temp[["objective"]]
      res[["Iterations"]][(i-1)*ni+2*ninit_vals+j] <- temp[["iterations"]]
      res[["FuncEvals"]][(i-1)*ni+2*ninit_vals+j] <- temp[["evaluations"]][[1]]

      temp <- nlminb(init_vals[j, ], ll_fs_Gon_17,
                     rt = rti, resp = respi, truth = truthi, err_tol = err_tol,
                     # limits:   vu,   vl,   a,      t0, w,  sv
                     lower = c(-Inf, -Inf, .01,       0, 0,   0),
                     upper = c( Inf,  Inf, Inf, min_rti, 1, Inf))
      res[["Convergence"]][(i-1)*ni+3*ninit_vals+j] <- temp[["convergence"]]
      res[["Objective"]][(i-1)*ni+3*ninit_vals+j] <- temp[["objective"]]
      res[["Iterations"]][(i-1)*ni+3*ninit_vals+j] <- temp[["iterations"]]
      res[["FuncEvals"]][(i-1)*ni+3*ninit_vals+j] <- temp[["evaluations"]][[1]]

      temp <- nlminb(init_vals[j, ], ll_fl_Nav_09,
                     rt = rti, resp = respi, truth = truthi, err_tol = err_tol,
                     # limits:   vu,   vl,   a,      t0, w,  sv
                     lower = c(-Inf, -Inf, .01,       0, 0,   0),
                     upper = c( Inf,  Inf, Inf, min_rti, 1, Inf))
      res[["Convergence"]][(i-1)*ni+4*ninit_vals+j] <- temp[["convergence"]]
      res[["Objective"]][(i-1)*ni+4*ninit_vals+j] <- temp[["objective"]]
      res[["Iterations"]][(i-1)*ni+4*ninit_vals+j] <- temp[["iterations"]]
      res[["FuncEvals"]][(i-1)*ni+4*ninit_vals+j] <- temp[["evaluations"]][[1]]

      temp <- nlminb(init_vals[j, ], ll_RTDists,
                     rt = rti, resp = respi, truth = truthi,
                     # limits:   vu,   vl,   a,      t0, w,  sv
                     lower = c(-Inf, -Inf, .01,       0, 0,   0),
                     upper = c( Inf,  Inf, Inf, min_rti, 1, Inf))
      res[["Convergence"]][(i-1)*ni+5*ninit_vals+j] <- temp[["convergence"]]
      res[["Objective"]][(i-1)*ni+5*ninit_vals+j] <- temp[["objective"]]
      res[["Iterations"]][(i-1)*ni+5*ninit_vals+j] <- temp[["iterations"]]
      res[["FuncEvals"]][(i-1)*ni+5*ninit_vals+j] <- temp[["evaluations"]][[1]]

      # microbenchmark
      mbm <- microbenchmark(
        fb_SWSE_17 = nlminb(init_vals[j,], ll_fb_SWSE_17, err_tol = err_tol,
                            rt = rti, resp = respi, truth = truthi,
                            # limits:   vu,   vl,   a,      t0, w,  sv
                            lower = c(-Inf, -Inf, .01,       0, 0,   0),
                            upper = c( Inf,  Inf, Inf, min_rti, 1, Inf)),
        fb_Gon_17 = nlminb(init_vals[j,], ll_fb_Gon_17, err_tol = err_tol,
                           rt = rti, resp = respi, truth = truthi,
                           # limits:   vu,   vl,   a,      t0, w,  sv
                           lower = c(-Inf, -Inf, .01,       0, 0,   0),
                           upper = c( Inf,  Inf, Inf, min_rti, 1, Inf)),
        fs_SWSE_14 = nlminb(init_vals[j,], ll_fs_SWSE_14, err_tol = err_tol,
                            rt = rti, resp = respi, truth = truthi,
                            # limits:   vu,   vl,   a,      t0, w,  sv
                            lower = c(-Inf, -Inf, .01,       0, 0,   0),
                            upper = c( Inf,  Inf, Inf, min_rti, 1, Inf)),
        fs_Gon_17 = nlminb(init_vals[j,], ll_fs_Gon_17, err_tol = err_tol,
                           rt = rti, resp = respi, truth = truthi,
                           # limits:   vu,   vl,   a,      t0, w,  sv
                           lower = c(-Inf, -Inf, .01,       0, 0,   0),
                           upper = c( Inf,  Inf, Inf, min_rti, 1, Inf)),
        fl_Nav_09 = nlminb(init_vals[j,], ll_fl_Nav_09, err_tol = err_tol,
                           rt = rti, resp = respi, truth = truthi,
                           # limits:   vu,   vl,   a,      t0, w,  sv
                           lower = c(-Inf, -Inf, .01,       0, 0,   0),
                           upper = c( Inf,  Inf, Inf, min_rti, 1, Inf)),
        rtdists = nlminb(init_vals[j,], ll_RTDists,
                         rt = rti, resp = respi, truth = truthi,
                         # limits:   vu,   vl,   a,      t0, w,  sv
                         lower = c(-Inf, -Inf, .01,       0, 0,   0),
                         upper = c( Inf,  Inf, Inf, min_rti, 1, Inf)),
        times = times, unit = unit
      )
      for (k in 1:nalgos) {
        res[["BmTime"]][(i-1)*ni+(k-1)*ninit_vals+j] <- median(
          mbm[mbm[["expr"]] == algo_names[k], 2])
      }
    }
  }
  return(res)
}

Running the Fitting

As an example dataset, we use the med_dec data that comes with fddm. This dataset contains the accuracy condition reported in Trueblood et al. (2018), which investigates medical decision making among medical professionals (pathologists) and novices (i.e., undergraduate students). The task of participants was to judge whether pictures of blood cells show cancerous cells (i.e., blast cells) or non-cancerous cells (i.e., non-blast cells). The dataset contains 200 decision per participant, based on pictures of 100 true cancerous cells and pictures of 100 true non-cancerous cells. We remove the trials without response (indicated by rt < 0 in the data) before fitting.

Having set up the fitting functions in the above chunks of code, we could pass the med_dec data to this function to get the parameter estimates. However, as this takes a relatively long time (around 30 minutes), we skip the fitting in this vignette and instead read the pre-fit parameter estimates in the next section.

data(med_dec, package = "fddm")
med_dec <- med_dec[which(med_dec[["rt"]] >= 0), ]
fit <- rt_fit(med_dec, id_idx = c(2,1), rt_idx = 8, response_idx = 7,
              truth_idx = 5, response_upper = "blast", err_tol = 1e-6,
              times = 25, unit = "ns")

Analysis of Benchmark Results

We begin by loading the required packages for plotting the results.

library("reshape2")
library("ggplot2")

Before visualizing the results of the data fitting benchmark tests, we need to provide some context for each set of parameter estimates since each run of the optimization does not necessarily converge. To add this context, we first find the minimum value of the objective function (i.e., the log-likelihood function) attained for each set of initial parameter values for each individual. Then we calculate how the objective value of each optimization run compares to that minimum; we will refer to this quantity as the objective difference. The following code chunk defines a function to perform this calculation and some other minor operations that will be helpful when plotting the results. This code chunk also applies this prep function to the pre-run benchmark data and defines some constants that we will use throughout the visualizations.

fit_prep <- function(fit, eps = 1e-4) {
  nr <- nrow(fit)
  fit[["Obj_diff"]] <- rep(0, nr)

  ids <- unique(fit[["ID"]])
  nids <- length(ids)
  algos <- unique(fit[["Algorithm"]])
  nalgos <- length(algos)

  ninit <- nrow(fit[fit[["ID"]] == ids[1] & fit[["Algorithm"]] == algos[1], ])
  for (i in 1:nids) {
    for (j in 1:ninit) {
      idx <- which(fit[["ID"]] == ids[i])[ninit*(0:(nalgos-1)) + j]
      objs <- fit[idx, "Objective"]
      min_obj <- min(objs)
      abs_min_obj <- abs(min_obj)
      obj_diffs <- objs - min(objs)
      fit[idx, "Obj_diff"] <- ifelse(obj_diffs <= eps*abs_min_obj, 0,
        ifelse(obj_diffs > eps*abs_min_obj & obj_diffs <= 2*abs_min_obj, 1, 3))
    }
  }

  fit[["BmTime"]] <- fit[["BmTime"]]*1e-6 # convert to milliseconds
  fit[["Convergence"]] <- ifelse(fit[["Convergence"]] < 1, 0, 1)

  return(fit)
}

# load data, will be in the variable 'fit'
load(system.file("extdata", "bm_fit.Rds", package = "fddm", mustWork = TRUE))
fit <- fit_prep(fit)

Names <- c("fb_SWSE_17", "fb_Gon_17", "fs_SWSE_14",
           "fs_Gon_17", "fl_Nav_09", "rtdists")
Color <- c("#e000b4", "#e68a00", "#cc99ff",
          "#c2a500", "#996633", "#990000")
Shape <- c(21, 25)
Sizes <- c(0, 3, 3)
Stroke <- c(0, 1, 1)
Fills <- c("#ffffff00", "#ffffff00", "#80808099")

We will now present two plots to illustrate how the various methods perform when used in fitting Ratcliff’s DDM to real-world data. The measures of interest for each method are the median microbenchmark timings and the number of function evaluations in the optimization process. Each one of the two plots will focus on one of these measures by showing a distribution of values of this measure in terms of violin plots and box plots. The distribution shown is the distribution across participants and starting values for each of the different methods of approximating the diffusion model; that is, for each of the 55 participants, we present results based on the 11 different starting values for each method (the starting values are constant across methods).

To help explain problematic fits, the plots will show any data point that represents an optimization run for which the objective difference is greater than our predefined tolerance of \(\epsilon = 0.0001\). More specifically, the objective difference is comparatively small (i.e., smaller than 2 on the log-likelihood scale) for the points shown with a completely transparent fill color; contrastingly, points shown with a gray fill color have a difference from the optimum of larger than 2 points on the log-likelihood scale (2 points on the log-likelihood scale corresponds to 1 point on the deviance scale). Data points with objective difference smaller than \(\epsilon\) will not be shown on the plot. In addition, each of these visible data points will have a shape to indicate its convergence code: a circle for successful convergence, and a triangle for failed convergence.

The first plot will show the densities of the microbenchmark timings for each method.

fit_mbm <- melt(fit, id.vars = c("Algorithm", "Convergence", "Obj_diff"),
                measure.vars = "BmTime", value.name = "BmTime")[,-4]

mibm <- min(fit[fit[["Algorithm"]] != "rtdists", "BmTime"])
mabm <- max(fit[fit[["Algorithm"]] != "rtdists", "BmTime"])

ggplot(fit_mbm, aes(x = factor(Algorithm, levels = Names),
                    y = BmTime)) +
  geom_violin(trim = TRUE, alpha = 0.5,
              aes(color = factor(Algorithm, levels = Names),
                  fill = factor(Algorithm, levels = Names))) +
  geom_boxplot(width = 0.2, outlier.shape = NA,
               fill = "white", alpha = 0.4,
               aes(color = factor(Algorithm, levels = Names))) +
  stat_summary(fun = mean, geom = "errorbar",
               aes(ymax = ..y.., ymin = ..y..),
               width = .5, linetype = "dashed",
               color = Color) +
  scale_color_manual(values = Color, guide = FALSE) +
  scale_shape_manual(values = Shape,
                     name = "Convergence Code",
                     breaks = c(0, 1),
                     labels = c("Success", "Failure")) +
  scale_size_manual(values = Sizes, guide = FALSE) +
  scale_discrete_manual(aesthetics = "stroke", values = Stroke, guide = FALSE) +
  scale_fill_manual(values = Color, guide = FALSE) +
  ggnewscale::new_scale_fill() +
  scale_fill_manual(values = Fills,
                    name = "Objective Difference",
                    breaks = c(1, 2, 3),
                    labels = c("< 2", "NA", "> 2")) +
  geom_point(aes(color = factor(Algorithm, levels = Names),
                 shape = factor(Convergence, levels = c(0, 1)),
                 size = factor(Obj_diff, levels = c(0, 1, 3)),
                 stroke = factor(Obj_diff, levels = c(0, 1, 3)),
                 fill = factor(Obj_diff, levels = c(0, 1, 3)))) +
  coord_cartesian(ylim = c(mibm, mabm)) +
  labs(title = "Median microbenchmark times for data fitting",
       subtitle = paste(
         "Dashed lines represent mean benchmark times",
         "Visible points have an objective difference greater than 1e-4",
         sep = ";\n"),
       x = "Method", y = "Time (milliseconds)") +
  guides(shape = guide_legend(order = 1,
                              override.aes = list(size = Sizes[c(2, 3)])),
         fill = guide_legend(order = 2,
                             override.aes = list(size = Sizes[c(2, 3)],
                                                 shape = c(21, 21),
                                                 fill = Fills[c(2, 3)]))) +
  theme_bw() +
  theme(panel.border = element_blank(),
        plot.title = element_text(size = 23),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 10, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 20),
        axis.title.y = element_text(size = 20,
                                    margin = margin(5, 10, 5, 5, "pt")),
        legend.position = "right",
        legend.box = "vertical",
        legend.direction = "vertical",
        legend.background = element_rect(fill = "transparent"),
        legend.title = element_text(size = 14),
        legend.text = element_text(size = 13))

This plot vaguely resembles the microbenchmark violin plot from the previous section. Immediately we observe that fitting to data using the rtdists function is noticeably slower than fitting to data using the functions provided in fddm. The large-time approximation has many problematic data points, further emphasizing its shortcomings when handling smaller response times. Upon closer inspection, the means and medians of the benchmark times are slightly lower for the two small-time algorithms, “fs_SWSE_14” and “fs_Gon_17”, than for either of the algorithms that combine both the small-time and large-time density functions, “fb_SWSE_17” and “fb_Gon_17”. However, these two seemingly faster small-time algorithms have more problematic data points in the optimization process, likely due to the poor support for large “effective response times” that we discussed in a previous section of this vignette. We also see that the two combined-time algorithms both only show three total cases where the algorithm did not reach the optimum for one participant and starting value. These issues can be avoided by using multiple sets of initial values and choosing the best result from that set. Despite these few problematic fits, the two combined-time algorithms are quite stable and are still fast. If the data is known to contain only small response times, then using one of the small-time approximations may yield fast and unproblematic fitting results. However, instability may still arise from the optimization routine testing small values of \(a\); this would create large “effective response times” and cause the small-time density function to reveal its vulnerability. One fitting run using these algorithms takes on average under 30 milliseconds on the system used for running the tests.

The second plot will show the distribution of the number of function evaluations in the optimization process for each method.

fit_fev <- melt(fit, id.vars = c("Algorithm", "Convergence", "Obj_diff"),
                measure.vars = "FuncEvals", value.name = "FuncEvals")[,-4]

mife <- min(fit[fit[["Algorithm"]] != "rtdists", "FuncEvals"])
mafe <- max(fit[fit[["Algorithm"]] != "rtdists", "FuncEvals"])

ggplot(fit_fev, aes(x = factor(Algorithm, levels = Names),
                    y = FuncEvals)) +
  geom_violin(trim = TRUE, alpha = 0.5,
              aes(color = factor(Algorithm, levels = Names),
                  fill = factor(Algorithm, levels = Names))) +
  geom_boxplot(width = 0.2, outlier.shape = NA,
               fill = "white", alpha = 0.4,
               aes(color = factor(Algorithm, levels = Names))) +
  stat_summary(fun = mean, geom = "errorbar",
               aes(ymax = ..y.., ymin = ..y..),
               width = .5, linetype = "dashed",
               color = Color) +
  scale_color_manual(values = Color, guide = FALSE) +
  scale_shape_manual(values = Shape,
                     name = "Convergence Code",
                     breaks = c(0, 1),
                     labels = c("Success", "Failure")) +
  scale_size_manual(values = Sizes, guide = FALSE) +
  scale_discrete_manual(aesthetics = "stroke", values = Stroke, guide = FALSE) +
  scale_fill_manual(values = Color, guide = FALSE) +
  ggnewscale::new_scale_fill() +
  scale_fill_manual(values = Fills,
                    name = "Objective Difference",
                    breaks = c(1, 2, 3),
                    labels = c("< 2", "NA", "> 2")) +
  geom_point(aes(color = factor(Algorithm, levels = Names),
                 shape = factor(Convergence, levels = c(0, 1)),
                 size = factor(Obj_diff, levels = c(0, 1, 3)),
                 stroke = factor(Obj_diff, levels = c(0, 1, 3)),
                 fill = factor(Obj_diff, levels = c(0, 1, 3)))) +
  coord_cartesian(ylim = c(mife, mafe)) +
  labs(title = "Function evaluations for data fitting",
       subtitle = paste(
         "Dashed lines represent mean benchmark times",
         "Visible points have an objective difference greater than 1e-4",
         sep = ";\n"),
       x = "Method", y = "Number of function evaluations") +
  guides(shape = guide_legend(order = 1,
                              override.aes = list(size = Sizes[c(2, 3)])),
         fill = guide_legend(order = 2,
                             override.aes = list(size = Sizes[c(2, 3)],
                                                 shape = c(21, 21),
                                                 fill = Fills[c(2, 3)]))) +
  theme_bw() +
  theme(panel.border = element_blank(),
        plot.title = element_text(size = 23),
        plot.subtitle = element_text(size = 16,
                                     margin = margin(5, 5, 10, 5, "pt")),
        axis.text.x = element_text(size = 16, angle = 90,
                                   vjust = 0.5, hjust = 1),
        axis.text.y = element_text(size = 16),
        axis.title.x = element_text(size = 20),
        axis.title.y = element_text(size = 20,
                                    margin = margin(5, 10, 5, 5, "pt")),
        legend.position = "right",
        legend.box = "vertical",
        legend.direction = "vertical",
        legend.background = element_rect(fill = "transparent"),
        legend.title = element_text(size = 14),
        legend.text = element_text(size = 13))

Here we see that the algorithms that implement both the small-time and large-time density functions appear to be the most stable. They have relatively short tails, suggesting that the optimization routine has little trouble in finding the optimum parameter estimates. Moreover, these algorithms have few data points where the objective difference is high or the convergence code indicates unsuccessful convergence. In contrast, the large-time approximation suffers greatly from its inability to handle smaller response times. The rtdists function performs very well in this test, but it is still hindered by much longer benchmark times

In addition to being the fastest approximation available, the default method for fddm, “fb_SWSE_17”, appears to be a generally stable method for fitting the DDM to real-world data. The optimization routine handles this algorithm consistently well in that there are very few troublesome areas of the parameter space compared to that of the algorithms which only implement the small-time density function. The default method also outperforms its counterparts, “fb_Gon_17” and “fb_Nav_17”, because it uses a faster mechanism to choose between the two methods and exploits the large-time density function where it is most useful. When comparing the algorithms for benchmark time, the two small-time algorithms have a faster mean and median than the default method, but their lengthy tails indicate potential stability issues in certain areas of the parameter space. On balance, the default method, “fb_SWSE_17”, provides the best compromise between speed and stability of the currently available algorithms for approximating the probability density functions of the diffusion decision model. For more information about the default method, see its section in the Math Vignette.

R Session Info

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                           
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.3.2        reshape2_1.4.4       microbenchmark_1.4-7
#> [4] RWiener_1.3-3        rtdists_0.11-2       fddm_0.2-2          
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5       pillar_1.4.6     compiler_4.0.2   plyr_1.8.6      
#>  [5] tools_4.0.2      digest_0.6.25    evd_2.3-3        evaluate_0.14   
#>  [9] lifecycle_0.2.0  tibble_3.0.3     gtable_0.3.0     lattice_0.20-41 
#> [13] pkgconfig_2.0.3  rlang_0.4.7      Matrix_1.2-18    yaml_2.2.1      
#> [17] mvtnorm_1.1-1    expm_0.999-5     xfun_0.18        withr_2.3.0     
#> [21] dplyr_1.0.2      stringr_1.4.0    knitr_1.30       generics_0.0.2  
#> [25] vctrs_0.3.4      tidyselect_1.1.0 grid_4.0.2       ggnewscale_0.4.3
#> [29] glue_1.4.2       R6_2.4.1         survival_3.2-7   rmarkdown_2.4   
#> [33] farver_2.0.3     purrr_0.3.4      magrittr_1.5     scales_1.1.1    
#> [37] htmltools_0.5.0  ellipsis_0.3.1   splines_4.0.2    colorspace_1.4-1
#> [41] labeling_0.3     stringi_1.5.3    gsl_2.1-6        munsell_0.5.0   
#> [45] msm_1.6.8        crayon_1.3.4

References

Gondan, Matthias, Steven P Blurton, and Miriam Kesselmeier. 2014. “Even Faster and Even More Accurate First-Passage Time Densities and Distributions for the Wiener Diffusion Model.” Journal of Mathematical Psychology 60: 20–22.

Trueblood, Jennifer S., William R. Holmes, Adam C. Seegmiller, Jonathan Douds, Margaret Compton, Eszter Szentirmai, Megan Woodruff, Wenrui Huang, Charles Stratton, and Quentin Eichbaum. 2018. “The Impact of Speed and Bias on the Cognitive Processes of Experts and Novices in Medical Image Decision-Making.” Cognitive Research: Principles and Implications 3 (1): 28. https://doi.org/10.1186/s41235-018-0119-2.