dplyr 0.7.2

dplyr 0.7.1

dplyr 0.7.0

New data, functions, and features

This verb is powered with the new select_var() internal helper, which is exported as well. It is like select_vars() but returns a single variable.

Deprecated and defunct

Databases

This version of dplyr includes some major changes to how database connections work. By and large, you should be able to continue using your existing dplyr database code without modification, but there are two big changes that you should be aware of:

You can continue to use src_mysql(), src_postgres(), and src_sqlite(), but I recommend a new style that makes the connection to DBI more clear:

library(dplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "mtcars", mtcars)

mtcars2 <- tbl(con, "mtcars")
mtcars2

This is particularly useful if you want to perform non-SELECT queries as you can do whatever you want with DBI::dbGetQuery() and DBI::dbExecute().

If you’ve implemented a database backend for dplyr, please read the backend news to see what’s changed from your perspective (not much). If you want to ensure your package works with both the current and previous version of dplyr, see wrap_dbplyr_obj() for helpers.

UTF-8

Colwise functions

Tidyeval

dplyr has a new approach to non-standard evaluation (NSE) called tidyeval. It is described in detail in vignette("programming") but, in brief, gives you the ability to interpolate values in contexts where dplyr usually works with expressions:

my_var <- quo(homeworld)

starwars %>%
  group_by(!!my_var) %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)

This means that the underscored version of each main verb is no longer needed, and so these functions have been deprecated (but remain around for backward compatibility).

Verbs

Joins

Select

Other

Combining and comparing

Vector functions

Other minor changes and bug fixes

dplyr 0.5.0

Breaking changes

Existing functions

Deprecated and defunct functions

New functions

Local backends

dtplyr

All data table related code has been separated out in to a new dtplyr package. This decouples the development of the data.table interface from the development of the dplyr package. If both data.table and dplyr are loaded, you’ll get a message reminding you to load dtplyr.

Tibble

Functions related to the creation and coercion of tbl_dfs, now live in their own package: tibble. See vignette("tibble") for more details.

tbl_cube

Remote backends

SQLite

SQL translation

Internals

This version includes an almost total rewrite of how dplyr verbs are translated into SQL. Previously, I used a rather ad-hoc approach, which tried to guess when a new subquery was needed. Unfortunately this approach was fraught with bugs, so in this version I’ve implemented a much richer internal data model. Now there is a three step process:

  1. When applied to a tbl_lazy, each dplyr verb captures its inputs and stores in a op (short for operation) object.

  2. sql_build() iterates through the operations building to build up an object that represents a SQL query. These objects are convenient for testing as they are lists, and are backend agnostics.

  3. sql_render() iterates through the queries and generates the SQL, using generics (like sql_select()) that can vary based on the backend.

In the short-term, this increased abstraction is likely to lead to some minor performance decreases, but the chance of dplyr generating correct SQL is much much higher. In the long-term, these abstractions will make it possible to write a query optimiser/compiler in dplyr, which would make it possible to generate much more succinct queries.

If you have written a dplyr backend, you’ll need to make some minor changes to your package:

There were two other tweaks to the exported API, but these are less likely to affect anyone.

Minor improvements and bug fixes

Single table verbs

Dual table verbs

Vector functions

dplyr 0.4.3

Improved encoding support

Until now, dplyr’s support for non-UTF8 encodings has been rather shaky. This release brings a number of improvement to fix these problems: it’s probably not perfect, but should be a lot better than the previously version. This includes fixes to arrange() (#1280), bind_rows() (#1265), distinct() (#1179), and joins (#1315). print.tbl_df() also recieved a fix for strings with invalid encodings (#851).

Other minor improvements and bug fixes

Databases

Hybrid evaluation

dplyr 0.4.2

This is a minor release containing fixes for a number of crashes and issues identified by R CMD CHECK. There is one new “feature”: dplyr no longer complains about unrecognised attributes, and instead just copies them over to the output.

dplyr 0.4.1

dplyr 0.4.0

New features

New vignettes

Minor improvements

Bug fixes

dplyr 0.3.0.1

dplyr 0.3

New functions

Programming with dplyr (non-standard evaluation)

Removed and deprecated features

Minor improvements and bug fixes

Minor improvements and bug fixes by backend

Databases

Data frames/tbl_df

Data tables

Cubes

dplyr 0.2

Piping

dplyr now imports %>% from magrittr (#330). I recommend that you use this instead of %.% because it is easier to type (since you can hold down the shift key) and is more flexible. With you %>%, you can control which argument on the RHS recieves the LHS by using the pronoun .. This makes %>% more useful with base R functions because they don’t always take the data frame as the first argument. For example you could pipe mtcars to xtabs() with:

mtcars %>% xtabs( ~ cyl + vs, data = .)

Thanks to @smbache for the excellent magrittr package. dplyr only provides %>% from magrittr, but it contains many other useful functions. To use them, load magrittr explicitly: library(magrittr). For more details, see vignette("magrittr").

%.% will be deprecated in a future version of dplyr, but it won’t happen for a while. I’ve also deprecated chain() to encourage a single style of dplyr usage: please use %>% instead.

Do

do() has been completely overhauled. There are now two ways to use it, either with multiple named arguments or a single unnamed arguments. group_by() + do() is equivalent to plyr::dlply, except it always returns a data frame.

If you use named arguments, each argument becomes a list-variable in the output. A list-variable can contain any arbitrary R object so it’s particularly well suited for storing models.

library(dplyr)
models <- mtcars %>% group_by(cyl) %>% do(lm = lm(mpg ~ wt, data = .))
models %>% summarise(rsq = summary(lm)$r.squared)

If you use an unnamed argument, the result should be a data frame. This allows you to apply arbitrary functions to each group.

mtcars %>% group_by(cyl) %>% do(head(., 1))

Note the use of the . pronoun to refer to the data in the current group.

do() also has an automatic progress bar. It appears if the computation takes longer than 5 seconds and lets you know (approximately) how much longer the job will take to complete.

New verbs

dplyr 0.2 adds three new verbs:

Minor improvements

Bug fixes

dplyr 0.1.3

Bug fixes

dplyr 0.1.2

New features

Bug fixes

dplyr 0.1.1

Improvements

Bug fixes