Adding a new SQL backend

2016-06-23

This document describes how to add a new SQL backend to dplyr. To begin:

This document is still a work in progress, but it will hopefully get you started. If you’re familiar with how your database and at least one other database that dplyr supports already, this should be reasonably simple task. However, it is possible that a new database backend may need new methods - I’m happy to add those as needed.

Create the src object

Start by creating a new src function to represent the backend. Assuming we’re going to create a src for postgres, you’d call it src_postgres(), and you’d follow the pattern of an existing src. A simplified version of src_postgres() is show below:

src_postgres <- function(dbname = NULL, host = NULL, port = NULL, user = NULL,
                         password = NULL, ...) {

  con <- dbConnect(PostgreSQL(), host = host %||% "", dbname = dbname %||% "",
    user = user, password = password %||% "", port = port %||% "", ...)

  src_sql("postgres", con)
}

Use src_sql() to create a new S3 object with the correct structure. It must have a DBI connection, but it can store anything else that might be useful.

Next, implement a method for src_desc() that briefly describes the source:

#' @export
src_desc.src_postgres <- function(con) {
  info <- dbGetInfo(con)
  host <- if (info$host == "") "localhost" else info$host

  paste0("postgres ", info$serverVersion, " [", info$user, "@",
    host, ":", info$port, "/", info$dbname, "]")
}

If you read the source code for the real src_postgres() you’ll notice that it caches the getGetInfo() field on creation, since this saves a little time when printing tbls.

Before continuing, check that you can create a connection to a local database, and that you get a listing of the existing tables. If you have a problem at this point, you may need to check the DBI backend.

tbl

Next implement the tbl() method for your data source. This will probably just be:

tbl.src_mssql <- function(src, from, ...) {
  tbl_sql("mssql", src = src, from = from, ...)
}

Before continuing, make sure you can connect to an existing table, and that the results are printed reasonably. If not, that may indicate your database uses a non-standard DBI interface, and you’ll need to fix that before continuing.

This is also a good time implement explain(), by adding a method for db_explain().

If your database uses non-standard quoting (i.e. something other than " for identifiers and ' for strings), implement methods for sql_escape_string() and sql_escape_ident().

You may need to implement db_query_fields(), which should return a character vector giving the field names of a query.

At this point, all the basic verbs (summarise(), filter(), arrange(), mutate() etc) should also work, but it’s hard to test without some data.

copy_to()

Next, implement the methods that power copy_to() work. Once you’ve implemented these methods, you’ll be able copy datasets from R into your database, which will make testing much easier.

If the database doesn’t support a function, just return TRUE without doing anything. If you find these methods a very poor match to your backend, you may find it easier to provide a direct copy_to() method.

At this point, you should be able to copy the nycflights13 data packages into your database with (e.g.):

copy_nycflights13(src_mssql(...))
copy_lahman(src_mssql(...))

Don’t proceed further until this works, and you’ve verified that the basic single table verbs word.

Compute, collect and collapse

Next, check that collapse(), compute(), and collect() work.

Multi table verbs

Next check the multitable verbs:

sql translation

To finish off, you can add custom R -> SQL translation by providing a method for src_translate_env(). This function should return an object created by sql_variant(). See existing methods for examples.