dplyr 0.5.0

Breaking changes

Existing functions

Deprecated and defunct functions

New functions

Local backends

dtplyr

All data table related code has been separated out in to a new dtplyr package. This decouples the development of the data.table interface from the development of the dplyr package. If both data.table and dplyr are loaded, you'll get a message reminding you to load dtplyr.

Tibble

Functions to related to the creation and coercion of tbl_dfs, now live in their own package: tibble. See vignette("tibble") for more details.

tbl_cube

Remote backends

SQLite

SQL translation

Internals

This version includes an almost total rewrite of how dplyr verbs are translated into SQL. Previously, I used a rather ad-hoc approach, which tried to guess when a new subquery was needed. Unfortunately this approach was fraught with bugs, so in this version I've implemented a much richer internal data model. Now there is a three step process:

  1. When applied to a tbl_lazy, each dplyr verb captures its inputs and stores in a op (short for operation) object.

  2. sql_build() iterates through the operations building to build up an object that represents a SQL query. These objects are convenient for testing as they are lists, and are backend agnostics.

  3. sql_render() iterates through the queries and generates the SQL, using generics (like sql_select()) that can vary based on the backend.

In the short-term, this increased abstraction is likely to lead to some minor performance decreases, but the chance of dplyr generating correct SQL is much much higher. In the long-term, these abstractions will make it possible to write a query optimiser/compiler in dplyr, which would make it possible to generate much more succinct queries.

If you have written a dplyr backend, you'll need to make some minor changes to your package:

There were two other tweaks to the exported API, but these are less likely to affect anyone.

Minor improvements and bug fixes

Single table verbs

Dual table verbs

Vector functions

dplyr 0.4.3

Improved encoding support

Until now, dplyr's support for non-UTF8 encodings has been rather shaky. This release brings a number of improvement to fix these problems: it's probably not perfect, but should be a lot better than the previously version. This includes fixes to arrange() (#1280), bind_rows() (#1265), distinct() (#1179), and joins (#1315). print.tbl_df() also recieved a fix for strings with invalid encodings (#851).

Other minor improvements and bug fixes

Databases

Hybrid evaluation

dplyr 0.4.2

This is a minor release containing fixes for a number of crashes and issues identified by R CMD CHECK. There is one new "feature": dplyr no longer complains about unrecognised attributes, and instead just copies them over to the output.

dplyr 0.4.1

dplyr 0.4.0

New features

New vignettes

Minor improvements

Bug fixes

dplyr 0.3.0.1

dplyr 0.3

New functions

Programming with dplyr (non-standard evaluation)

Removed and deprecated features

Minor improvements and bug fixes

Minor improvements and bug fixes by backend

Databases

Data frames/tbl_df

Data tables

Cubes

dplyr 0.2

Piping

dplyr now imports %>% from magrittr (#330). I recommend that you use this instead of %.% because it is easier to type (since you can hold down the shift key) and is more flexible. With you %>%, you can control which argument on the RHS recieves the LHS by using the pronoun .. This makes %>% more useful with base R functions because they don't always take the data frame as the first argument. For example you could pipe mtcars to xtabs() with:

mtcars %>% xtabs( ~ cyl + vs, data = .)

Thanks to @smbache for the excellent magrittr package. dplyr only provides %>% from magrittr, but it contains many other useful functions. To use them, load magrittr explicitly: library(magrittr). For more details, see vignette("magrittr").

%.% will be deprecated in a future version of dplyr, but it won't happen for a while. I've also deprecated chain() to encourage a single style of dplyr usage: please use %>% instead.

Do

do() has been completely overhauled. There are now two ways to use it, either with multiple named arguments or a single unnamed arguments. group_by() + do() is equivalent to plyr::dlply, except it always returns a data frame.

If you use named arguments, each argument becomes a list-variable in the output. A list-variable can contain any arbitrary R object so it's particularly well suited for storing models.

library(dplyr)
models <- mtcars %>% group_by(cyl) %>% do(lm = lm(mpg ~ wt, data = .))
models %>% summarise(rsq = summary(lm)$r.squared)

If you use an unnamed argument, the result should be a data frame. This allows you to apply arbitrary functions to each group.

mtcars %>% group_by(cyl) %>% do(head(., 1))

Note the use of the . pronoun to refer to the data in the current group.

do() also has an automatic progress bar. It appears if the computation takes longer than 5 seconds and lets you know (approximately) how much longer the job will take to complete.

New verbs

dplyr 0.2 adds three new verbs:

Minor improvements

Bug fixes

dplyr 0.1.3

Bug fixes

dplyr 0.1.2

New features

Bug fixes

dplyr 0.1.1

Improvements

Bug fixes