Dplyr uses non-standard evaluation (NSE) in all the important single table verbs:
group_by(). NSE is important not only because it reduces typing; for database backends, it’s what makes it possible to translate R code into SQL. However, while NSE is great for interactive use it’s hard to program with. This vignette describes how you can opt out of NSE in dplyr, and instead (with a little quoting) rely only on standard evaluation (SE).
Behind the scenes, NSE is powered by the lazyeval package. The goal is to provide an approach to NSE that you can learn once and then apply in many places (dplyr is the first of my packages to use this approach, but over time I will implement it everywhere). You may want to read the lazyeval vignettes, if you’d like to learn more about the underlying details, or if you’d like to use this approach in your own packages.
Every function in dplyr that uses NSE also has a version that uses SE. The name of the SE version is always the NSE name with an
_ on the end. For example, the SE version of
summarise_(); the SE version of
arrange_(). These functions work very similarly to their NSE cousins, but their inputs must be “quoted”:
# NSE version: summarise(mtcars, mean(mpg)) #> mean(mpg) #> 1 20.09062 # SE versions: summarise_(mtcars, ~mean(mpg)) #> mean(mpg) #> 1 20.09062 summarise_(mtcars, quote(mean(mpg))) #> mean(mpg) #> 1 20.09062 summarise_(mtcars, "mean(mpg)") #> mean(mpg) #> 1 20.09062
There are three ways to quote inputs that dplyr understands:
It’s best to use a formula because a formula captures both the expression to evaluate and the environment where the evaluation occurs. This is important if the expression is a mixture of variables in a data frame and objects in the local environment:
constant1 <- function(n) ~n summarise_(mtcars, constant1(4)) #> n #> 1 4
# Using anything other than a formula will fail because it doesn't # know which environment to look in constant2 <- function(n) quote(n) summarise_(mtcars, constant2(4)) #> Error in eval(expr, envir, enclos): binding not found: 'n'
If you also want output variables to vary, you need to pass a list of quoted objects to the
n <- 10 dots <- list(~mean(mpg), ~n) summarise_(mtcars, .dots = dots) #> mean(mpg) n #> 1 20.09062 10 summarise_(mtcars, .dots = setNames(dots, c("mean", "count"))) #> mean count #> 1 20.09062 10
What if you need to mingle constants and variables? Use the handy
library(lazyeval) # Interp works with formulas, quoted calls and strings (but formulas are best) interp(~ x + y, x = 10) #> ~10 + y interp(quote(x + y), x = 10) #> 10 + y interp("x + y", x = 10) #>  "10 + y" # Use as.name if you have a character string that gives a variable name interp(~ mean(var), var = as.name("mpg")) #> ~mean(mpg) # or supply the quoted name directly interp(~ mean(var), var = quote(mpg)) #> ~mean(mpg)
Because every action in R is a function call you can use this same idea to modify functions:
interp(~ f(a, b), f = quote(mean)) #> ~mean(a, b) interp(~ f(a, b), f = as.name("+")) #> ~a + b interp(~ f(a, b), f = quote(`if`)) #> ~if (a) b
If you already have a list of values, use
interp(~ x + y, .values = list(x = 10)) #> ~10 + y # You can also interpolate variables defined in the current # environment, but this is a little risky becuase it's easy # for this to change without you realising y <- 10 interp(~ x + y, .values = environment()) #> ~x + 10