This vignette addresses drake's known edge cases, pitfalls, and weaknesses that might not be fixed in future releases. For the most up-to-date information on unhandled edge cases, please visit the issue tracker, where you can submit your own bug reports as well. Be sure to search the closed issues too, especially if you are not using the most up-to-date development version of drake. For a guide to debugging and testing drake projects, please refer to the separate “debug” vignette.

Projects built with drake <= 4.4.0 are not back compatible with drake > 4.4.0.

Versions of drake after 4.4.0 have different caching internals. That means if you built your project with drake 4.4.0 or earlier, later versions of drake will not let you call make(), which could mangle your work. To migrate your project to a later drake, you have three options.

  1. Revert to a back-compatible version of drake with devtools::install_version("drake", "4.4.0"). Here, the devtools package must be installed (install.packages("devtools")).
  2. Run your project from scratch with make(..., force = TRUE).
  3. Use migrate_drake_project() to convert your project to the new format. The migrate_drake_project() function
    1. copies your old project to a backup folder.
    2. converts your project's cache to a format compatible with drake 5.0.0 and later.
    3. informs you whether the migration succeeded: that is, whether outdated targets remained outdated and up-to-date targets remain up to date after migration.

Workflow plans

Externalizing commands in R script files

It is common practice to divide the work of a project into multiple R files, but if you do this, you will not get the most out of drake. Please see the best practices vignette for more details.

Beware unparsable symbols in your workflow plan.

In your workflow plan, be sure that target names can be parsed as symbols and commands can be parsed as R code. To be safe, use check_plan(my_plan) to screen for illegal symbols and other problem areas.

A common pitfall is using the evaluate_plan() function to expand wildcards after applying single quotes to file targets.

library(magrittr) # for the pipe operator %>%
  data = readRDS("data_DATASIZE__rds")
) %>%
    file.csv = write.csv(
      data_DATASIZE__, # nolint
    strings_in_dots = "literals",
    file_targets = T
  )) %>%
    rules = list(DATASIZE__ = c("small", "large"))
##             target                                command
## 1       data_small               readRDS('data_smallrds')
## 2       data_large               readRDS('data_largerds')
## 3 'file.csv'_small write.csv(data_small, "file_smallcsv")
## 4 'file.csv'_large write.csv(data_large, "file_largecsv")

The single quotes in the middle of 'file.csv'_small and 'file.csv'large are illegal, and the target names do not even correspond to the files written. Instead, construct your workflow plan in multiple stages and apply the single quotes at the very end.

rules <- list(DATASIZE__ = c("small", "large"))
datasets <- drake_plan(data = readRDS("data_DATASIZE__rds")) %>%
  evaluate_plan(rules = rules)

Plan the CSV files separately.

files <- drake_plan(
  file = write.csv(data_DATASIZE__, "file_DATASIZE__csv"), # nolint
  strings_in_dots = "literals"
) %>%
  evaluate_plan(rules = rules)

Single-quote the file targets after evaluate_plan().

files$target <- paste0(
  files$target, ".csv"
) %>%

Put the workflow plan together.

rbind(datasets, files)
##             target                                command
## 1       data_small               readRDS('data_smallrds')
## 2       data_large               readRDS('data_largerds')
## 3 'file_small.csv' write.csv(data_small, "file_smallcsv")
## 4 'file_large.csv' write.csv(data_large, "file_largecsv")

For finer control over target names in cases like this, you may want to use the wildcard package.

Commands are NOT perfectly flexible.

In your workflow plan data frame (produced by drake_plan() and accepted by make()), your commands can usually be flexible R expressions.

  target1 = 1 + 1 - sqrt(sqrt(3)),
  target2 = my_function(web_scraped_data) %>% my_tidy
##    target                                   command
## 1 target1                     1 + 1 - sqrt(sqrt(3))
## 2 target2 my_function(web_scraped_data) %>% my_tidy

However, please try to avoid formulas and function definitions in your commands. You may be able to get away with drake_plan(f = function(x){x + 1}) or drake_plan(f = y ~ x) in some use cases, but be careful. It is generally to define functions and formulas in your workspace and then let make() import them. (Alternatively, use the envir argument to make() to tightly control which imported functions are available.) Use the check_plan() function to help screen and quality-control your workflow plan data frame, use tracked() to see the items that are reproducibly tracked, and use vis_drake_graph() and build_drake_graph() to see the dependency structure of your project.


Install drake properly.

You must properly install drake using install.packages(), devtools::install_github(), or a similar approach. Functions like devtools::load_all() are insufficient, particularly for parallel computing functionality in which separate new R sessions try to require(drake).

Install all your packages.

Your workflow may depend on external packages such as ggplot2, dplyr, and MASS. Such packages must be formally installed with install.packages(), devtools::install_github(), devtools::install_local(), or a similar command. If you load uninstalled packages with devtools::load_all(), results may be unpredictable and incorrect.

Find and diagnose your errors.

When make() fails, use failed() and diagnose() to debug. Try the following out yourself.

## character(0)

f <- function(){
  stop("unusual error")

bad_plan <- drake_plan(target = f())

## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## connect 11 imports: tmp, files, datasets, f, rules, simulate, reg1, my_plan, ...
## connect 1 target: target
## check 1 item: stop
## check 1 item: f
## check 1 item: target
## target target
## Error building target target: unusual error
## fail target
## Error: Target 'target' failed to build. Use diagnose(target) to retrieve diagnostic information.

failed() # From the last make() only
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## [1] "target"

diagnose() # From all previous make()s
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## [1] "target"

error <- diagnose(target)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake

## List of 3
##  $ message: chr "unusual error"
##  $ call   : language f()
##  $ calls  :List of 3
##   ..$ : language (function() {     { ...
##   ..$ : language f()
##   ..$ : language stop("unusual error")
##   .. ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 4 3 4 23 3 23 4 4
##   .. .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x4fdfbc0> 
##  - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

error$calls # View the traceback.
## [[1]]
## (function() {
##     {
##         f()
##     }
## })()
## [[2]]
## f()
## [[3]]
## stop("unusual error")

Your workspace is modified by default.

As of version 3.0.0, drake's execution environment is the user's workspace by default. As an upshot, the workspace is vulnerable to side-effects of make() (which are generally limited to loading and unloading targets). To protect your workspace, you may want to create a custom evaluation environment containing all your imported objects and then pass it to the envir argument of make(). Here is how.

clean(verbose = FALSE)
envir <- new.env(parent = globalenv())
    f <- function(x){
      g(x) + 1
    g <- function(x){
      x + 1
  envir = envir
myplan <- drake_plan(out = f(1:3))

make(myplan, envir = envir)
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## connect 2 imports: f, g
## connect 1 target: out
## check 1 item: g
## check 1 item: f
## check 1 item: out
## target out

ls() # Check that your workspace did not change.
##  [1] "bad_plan"  "datasets"  "envir"     "error"     "f"        
##  [6] "files"     "good_plan" "my_plan"   "myplan"    "reg1"     
## [11] "reg2"      "rules"     "simulate"  "tmp"

ls(envir) # Check your evaluation environment.
## [1] "f"   "g"   "out"

## [1] 3 4 5

## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## [1] 3 4 5

Refresh the drake_config() list early and often.

The master configuration list returned by drake_config() is important to drake's internals, and you will need it for functions like outdated() and vis_drake_graph(). The config list corresponds to a single call to make(), and you should not modify it by hand afterwards. For example, modifying the targets element post-hoc will have no effect because the graph element will remain the same. It is best to just call drake_config() again.

Take special precautions if your drake project is a package.

Some users like to structure their drake projects as formal R packages. The straightforward way to run such a project is to

  1. Write all your imported functions in *.R files in the package's R/ folder.
  2. Load the package's environment with devtools::load_all().
  3. Call drake::make().
env <- devtools::load_all("yourProject")$env # Has all your imported functions
drake::make(my_plan, envir = env)            # Run the project normally.

However, the simple strategy above only works for parLapply parallelism with jobs = 1 and mcapply parallelism. For other kinds of parallelism, you must turn devtools::load_all("yourProject")$env into an ordinary environment that does not look like a package namespace. Thanks to Jasper Clarkberg, we have the following workaround.

  1. Clone devtools::load_all("yourProject")$env in order to change the binding environment of all your functions.
env <- devtools::load_all("yourProject")$env
env <- list2env(as.list(env), parent = globalenv())
  1. Change the enclosing environment of your functions using an unfortunate hack involving environment<-.
for (name in ls(env)){
    x = name,
    envir = env,
    value = `environment<-`(get(n, envir = env), env)
  1. Make sure drake does not attach yourProject as an external package.
package_name <- "yourProject" # devtools::as.package(".")$package # nolint
packages_to_load <- setdiff(.packages(), package_name)
  1. Run the project with make().
  my_plan,                    # Prepared in advance
  envir = env,                # Environment of package "yourProject"
  parallelism = "Makefile",   # Or "parLapply"
  jobs = 2,
  packages = packages_to_load # Does not include "yourProject"

You may need to adapt this last workaround, depending on the structure of the package, yourProject.

The lazy_load flag does not work with "parLapply" parallelism.

Ordinarily, drake prunes the execution environment at every parallelizable stage. In other words, it loads all the dependencies and unloads anything superfluous for entire batches of targets. This approach may require too much memory for some use cases, so there is an option to delay the loading of dependencies using the lazy_load argument to make() (powered by delayedAssign()). There are two major risks.

  1. make(..., lazy_load = TRUE, parallelism = "parLapply", jobs = 2) does not work. If you want to use local multisession parallelism with multiple jobs and lazy loading, try "future_lapply" parallelism instead.

    load_basic_example() # Get the code with drake_example("basic").
    make(my_plan, lazy_load = TRUE, parallelism = "future_lapply")
  2. Delayed evaluation may cause the same dependencies to be loaded multiple times, and these duplicated loads could be slow.

Timeouts may be unreliable.

You can call make(..., timeout = 10) to time out all each target after 10 seconds. However, timeouts rely on R.utils::withTimeout(), which in turn relies on setTimeLimit(). These functions are the best that R can offer right now, but they have known issues, and timeouts may fail to take effect for certain environments.


Triggers and skipped imports

With alternate triggers and the option to skip imports, you can sacrifice reproducibility to gain speed. However, these options can throw the dependency network out of sync. You should only use them for testing and debugging.

Dependencies are not tracked in some edge cases.

You should explicitly learn the items in your workflow and the dependencies of your targets.


Drake can be fooled into skipping objects that should be treated as dependencies. For example:

f <- function(){
  b <- get("x", envir = globalenv())           # x is incorrectly ignored
  file_dependency <- readRDS('input_file.rds') # 'input_file.rds' is incorrectly ignored # nolint

## [1] "digest::digest" "get"            "globalenv"      "readRDS"

command <- "x <- digest::digest('input_file.rds'); assign(\"x\", 1); x"
## [1] "'input_file.rds'" "assign"           "digest::digest"

Dynamic R Markdown / knitr reports

In dynamic knitr reports, drake automatically looks for active code chunks and learns dependencies from calls to loadd() and readd(). This behavior activates when

  1. the appropriate command in your command has an explicit reference to knit() or render() (from the rmarkdown package), and
  2. the *.Rmd/*.Rnw source file already exists before make().
load_basic_example() # Get the code with drake_example("basic").
## cache /tmp/Rtmp7EbM7A/Rbuild75ca54d145b9/drake/vignettes/.drake
## connect 15 imports: tmp, error, envir, files, datasets, myplan, f, rules, sim...
## connect 15 targets: '', small, large, regression1_small, regression1...
my_plan[1, ]
##        target                          command
## 1 '' knit('report.Rmd', quiet = TRUE)

Above, the R Markdown report loads targets 'small', 'large', and 'coef_regression2_small' using code chunks marked for evaluation.

## [1] "'report.Rmd'"           "coef_regression2_small"
## [3] "knit"                   "large"                 
## [5] "small"

## [1] "'report.Rmd'"           "coef_regression2_small"
## [3] "large"                  "render"                
## [5] "small"

deps("'report.Rmd'") # These are actually dependencies of '' (output)
## [1] "coef_regression2_small" "large"                 
## [3] "small"

However, you must explicitly mention each and every target loaded into a report. The following examples are discouraged in code chunks because they do not reference any particular target directly or literally in a way that static code analysis can detect.

var <- "good_target"
# Works in isolation, but drake sees "var" literally as a dependency,
# not "good_target".
readd(target = var, character_only = TRUE)
loadd(list = var)
# All cached items are loaded, but none are treated as dependencies.
loadd(imports_only = TRUE)

The knit() and render() functions

Functions knit() and render() are special. When you explicitly mention them in a command for a target, you are signaling to drake that you have a dynamic report (like an R Markdown *.Rmd file). Drake assumes you are using knit() from the knitr package and render() from the rmarkdown package. Thus, unless you know exactly what you are doing, please do not define your own custom knitr() or render() functions or load different versions of these functions from other packages.

Functions produced by Vectorize()

With functions produced by Vectorize(), detecting dependencies is especially hard because the body of every such function is

args <- lapply(as.list([-1L], eval, parent.frame())
names <- if (is.null(names(args)))
    character(length(args)) else names(args)
dovec <- names %in% vectorize.args"mapply", c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]),

Thus, if f is constructed with Vectorize(g, ...), drake searches g() for dependencies, not f(). In fact, if drake sees that environment(f)[["FUN"]] exists and is a function, then environment(f)[["FUN"]] will be analyzed instead of f(). Furthermore, if f() is the output of Vectorize(), then drake reproducibly tracks environment(f)[["FUN"]] rather than f() itself. Thus, if the configuration settings of vectorization change (such as which arguments are vectorized), but the core element-wise functionality remains the same, then make() will not react. Also, if you hover over the f node in vis_drake_graph(hover = TRUE), then you will see the body of environment(f)[["FUN"]], not the body of f().

Compiled code is not reproducibly tracked.

Some R functions use .Call() to run compiled code in the backend. The R code in these functions is tracked, but not the compiled object called with .Call(), nor its C/C++/Fortran source.

Directories (folders) are not reproducibly tracked.

In your workflow plan, you can use single quotes to assert that some targets/imports are external files. However, entire directories (i.e. folders) cannot be reproducibly tracked this way. Please see issue 12 for a discussion.

Packages are not tracked as dependencies.

Drake may import functions from packages, but the packages themselves are not tracked as dependencies. For this, you will need other tools that support reproducibility beyond the scope of drake. Packrat creates a tightly-controlled local library of packages to extend the shelf life of your project. And with Docker, you can execute your project on a virtual machine to ensure platform independence. Together, packrat and Docker can help others reproduce your work even if they have different software and hardware.

High-performance computing

The practical utility of parallel computing

Drake claims that it can

  1. Build and cache your targets in parallel (in stages).
  2. Build and cache your targets in the correct order, finishing dependencies before starting targets that depend on them.
  3. Deploy your targets to the parallel backend of your choice.

However, the practical efficiency of the parallel computing functionality remains to be verified rigorously. Serious performance studies will be part of future work that has not yet been conducted at the time of writing. In addition, each project has its own best parallel computing set up, and the user needs to optimize it on a case-by-case basis. Some general considerations include the following.

Maximum number of simultaneous jobs

Be mindful of the maximum number of simultaneous parallel jobs you deploy. At best, too many jobs is poor etiquette on a system with many users and limited resources. At worst, too many jobs will crash a system. The jobs argument to make() sets the maximum number of simultaneous jobs in most cases, but not all.

For most of drake's parallel backends, jobs sets the maximum number of simultaneous parallel jobs. However, there are ways to break the pattern. For example, make(..., parallelism = "Makefile", jobs = 2, args = "--jobs=4") uses at most 2 jobs for the imports and at most 4 jobs for the targets. (In make(), args overrides jobs for the targets). For make(..., parallelism = "future_lapply"), the jobs argument is ignored altogether. Instead, you should set the workers argument where it is available (for example, future::plan(mutlisession(workers = 2)) or future::plan(future.batchtools::batchtools_local(workers = 2))) in the preparations before make(). Alternatively, you might limit the max number of jobs by setting options(mc.cores = 2) before calling make(). Depending on the future backend you select with future::plan() or future::plan(), you might make use of one of the other environment variables listed in ?future::future.options.

Parallel computing on Windows

On Windows, do not use make(..., parallelism = "mclapply", jobs = n) with n greater than 1. You could try, but jobs will just be demoted to 1. Instead, please replace "mclapply" with one of the other parallelism_choices() or let drake choose the parallelism backend for you. For make(..., parallelism = "Makefile"), Windows users need to download and install Rtools.

Configuring future/batchtools parallelism for clusters

The "future_lapply" backend unlocks a large array of distributed computing options on serious computing clusters. However, it is your responsibility to configure your workflow for your specific job scheduler. In particular, special batchtools *.tmpl configuration files are required, and the technique is described in the documentation of batchtools. You can find some examples of these files in the inst/templates folders of the batchtools and future.batchtools GitHub repositories. Drake has some built-in prepackaged example workflows. See drake_examples() to view your options, and then drake_example() to write the files for an example.

drake_example("sge")    # Sun/Univa Grid Engine workflow and supporting files
drake_example("slurm")  # SLURM
drake_example("torque") # TORQUE

To write just *.tmpl files from these examples, see the drake_batchtools_tmpl_file() function.

Unfortunately, there is no one-size-fits-all *.tmpl configuration file for any job scheduler, so we cannot guarantee that the above examples will work for you out of the box. To learn how to configure the files to suit your needs, you should make sure you understand your job scheduler and batchtools.

Proper Makefiles are not standalone.

The Makefile generated by make(myplan, parallelism = "Makefile") is not standalone. Do not run it outside of drake::make(). Drake uses dummy timestamp files to tell the Makefile what to do, and running make in the terminal will most likely give incorrect results.

Makefile-level parallelism for imported objects and files

Makefile-level parallelism is only used for targets in your workflow plan data frame, not imports. To process imported objects and files, drake selects the best local parallel backend for your system and uses the jobs argument to make(). To use at most 2 jobs for imports and at most 4 jobs for targets, run

make(..., parallelism = "Makefile", jobs = 2, args = "--jobs=4")

Zombie processes

Some parallel backends, particularly mclapply and future::multicore, may create zombie processes. Zombie children are not usually harmful, but you may wish to kill them yourself. The following function by Carl Boneri should work on Unix-like systems. For a discussion, see drake issue 116.

fork_kill_zombies <- function(){
  includes <- "#include <sys/wait.h>"
  code <- "int wstat; while (waitpid(-1, &wstat, WNOHANG) > 0) {};"

  wait <- inline::cfunction(
    body = code,
    includes = includes,
    convention = ".C"



Cache customization is limited

The storage vignette describes how storage works in drake. As explained near the end of that vignette, you can plug custom storr caches into make(). However, non-RDS caches such as storr_dbi() may not work with parallel computing. In addition, please do not try to change the short hash algorithm of an existing cache.

Runtime predictions

In predict_runtime() and rate_limiting_times(), drake only accounts for the targets with logged build times. If some targets have not been timed, drake throws a warning and prints the untimed targets.