DBI: R Database Interface

DBI defines an interface for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations (so-called DBI backends).

Definition

A DBI backend is an R package which imports the DBI and methods packages. For better or worse, the names of many existing backends start with ‘R’, e.g., RSQLite, RMySQL, RSQLServer; it is up to the backend author to adopt this convention or not.

DBI classes and methods

A backend defines three classes, which are subclasses of DBIDriver, DBIConnection, and DBIResult. The backend provides implementation for all methods of these base classes that are defined but not implemented by DBI. All methods have an ellipsis ... in their formals.

Construction of the DBIDriver object

The backend must support creation of an instance of its DBIDriver subclass with a constructor function. By default, its name is the package name without the leading ‘R’ (if it exists), e.g., SQLite for the RSQLite package. However, backend authors may choose a different name. The constructor must be exported, and it must be a function that is callable without arguments. DBI recommends to define a constructor with an empty argument list.

Examples

RSQLite::SQLite()

Determine the SQL data type of an object

This section describes the behavior of the following method:

dbDataType(dbObj, obj, ...)

Description

Returns an SQL string that describes the SQL data type to be used for an object. The default implementation of this generic determines the SQL type of an R object according to the SQL 92 specification, which may serve as a starting point for driver implementations. DBI also provides an implementation for data.frame which will return a character vector giving the type for each column in the dataframe.

Arguments

dbObj

A object inheriting from DBIDriver or DBIConnection

obj

An R object whose SQL type we want to determine.

...

Other arguments passed on to methods.

Details

The data types supported by databases are different than the data types in R, but the mapping between the primitive types is straightforward:

  • Any of the many fixed and varying length character types are mapped to character vectors

  • Fixed-precision (non-IEEE) numbers are mapped into either numeric or integer vectors.

Notice that many DBMS do not follow IEEE arithmetic, so there are potential problems with under/overflows and loss of precision.

Value

dbDataType() returns the SQL type that corresponds to the obj argument as a non-empty character string. For data frames, a character vector with one element per column is returned. An error is raised for invalid values for the obj argument such as a NULL value.

Specification

The backend can override the dbDataType() generic for its driver class.

This generic expects an arbitrary object as second argument. To query the values returned by the default implementation, run example(dbDataType, package = "DBI"). If the backend needs to override this generic, it must accept all basic R data types as its second argument, namely logical, integer, numeric, character, dates (see Dates), date-time (see DateTimeClasses), and difftime. If the database supports blobs, this method also must accept lists of raw vectors, and blob::blob objects. As-is objects (i.e., wrapped by I()) must be supported and return the same results as their unwrapped counterparts. The SQL data type for factor and ordered is the same as for character. The behavior for other object types is not specified.

All data types returned by dbDataType() are usable in an SQL statement of the form "CREATE TABLE test (a ...)".

Examples

dbDataType(ANSI(), 1:5)
dbDataType(ANSI(), 1)
dbDataType(ANSI(), TRUE)
dbDataType(ANSI(), Sys.Date())
dbDataType(ANSI(), Sys.time())
dbDataType(ANSI(), Sys.time() - as.POSIXct(Sys.Date()))
dbDataType(ANSI(), c("x", "abc"))
dbDataType(ANSI(), list(raw(10), raw(20)))
dbDataType(ANSI(), I(3))

dbDataType(ANSI(), iris)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbDataType(con, 1:5)
dbDataType(con, 1)
dbDataType(con, TRUE)
dbDataType(con, Sys.Date())
dbDataType(con, Sys.time())
dbDataType(con, Sys.time() - as.POSIXct(Sys.Date()))
dbDataType(con, c("x", "abc"))
dbDataType(con, list(raw(10), raw(20)))
dbDataType(con, I(3))

dbDataType(con, iris)

dbDisconnect(con)

Create a connection to a DBMS

This section describes the behavior of the following method:

dbConnect(drv, ...)

Description

Connect to a DBMS going through the appropriate authentication procedure. Some implementations may allow you to have multiple connections open, so you may invoke this function repeatedly assigning its output to different objects. The authentication mechanism is left unspecified, so check the documentation of individual drivers for details.

Arguments

drv

an object that inherits from DBIDriver, or an existing DBIConnection object (in order to clone an existing connection).

...

authentication arguments needed by the DBMS instance; these typically include user, password, host, port, dbname, etc. For details see the appropriate DBIDriver.

Value

dbConnect() returns an S4 object that inherits from DBIConnection. This object is used to communicate with the database engine.

Specification

DBI recommends using the following argument names for authentication parameters, with NULL default:

  • user for the user name (default: current user)

  • password for the password

  • host for the host name (default: local connection)

  • port for the port number (default: local connection)

  • dbname for the name of the database on the host, or the database file name

The defaults should provide reasonable behavior, in particular a local connection for host = NULL. For some DBMS (e.g., PostgreSQL), this is different to a TCP/IP connection to localhost.

Examples

# SQLite only needs a path to the database. (Here, ":memory:" is a special
# path that creates an in-memory database.) Other database drivers
# will require more details (like user, password, host, port, etc.)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
con

dbListTables(con)

dbDisconnect(con)

Disconnect (close) a connection

This section describes the behavior of the following method:

dbDisconnect(conn, ...)

Description

This closes the connection, discards all pending work, and frees resources (e.g., memory, sockets).

Arguments

conn

A DBIConnection object, as returned by dbConnect().

...

Other parameters passed on to methods.

Value

dbDisconnect() returns TRUE, invisibly.

Specification

A warning is issued on garbage collection when a connection has been released without calling dbDisconnect(). A warning is issued immediately when calling dbDisconnect() on an already disconnected or invalid connection.

Examples

con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbDisconnect(con)

Execute a query on a given database connection

This section describes the behavior of the following method:

dbSendQuery(conn, statement, ...)

Description

The dbSendQuery() method only submits and synchronously executes the SQL query to the database engine. It does not extract any records — for that you need to use the dbFetch() method, and then you must call dbClearResult() when you finish fetching the records you need. For interactive use, you should almost always prefer dbGetQuery().

Arguments

conn

A DBIConnection object, as returned by dbConnect().

statement

a character string containing SQL.

...

Other parameters passed on to methods.

Details

This method is for SELECT queries only. Some backends may support data manipulation queries through this method for compatibility reasons. However, callers are strongly encouraged to use dbSendStatement() for data manipulation statements.

The query is submitted to the database server and the DBMS executes it, possibly generating vast amounts of data. Where these data live is driver-specific: some drivers may choose to leave the output on the server and transfer them piecemeal to R, others may transfer all the data to the client – but not necessarily to the memory that R manages. See individual drivers' dbSendQuery() documentation for details.

Value

dbSendQuery() returns an S4 object that inherits from DBIResult. The result set can be used with dbFetch() to extract records. Once you have finished using a result, make sure to clear it with dbClearResult(). An error is raised when issuing a query over a closed or invalid connection, if the syntax of the query is invalid, or if the query is not a non-NA string.

Specification

No warnings occur under normal conditions. When done, the DBIResult object must be cleared with a call to dbClearResult(). Failure to clear the result set leads to a warning when the connection is closed.

If the backend supports only one open result set per connection, issuing a second query invalidates an already open result set and raises a warning. The newly opened result set is valid and must be cleared with dbClearResult().

Examples

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "mtcars", mtcars)
rs <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4;")
dbFetch(rs)
dbClearResult(rs)

dbDisconnect(con)

Fetch records from a previously executed query

This section describes the behavior of the following methods:

dbFetch(res, n = -1, ...)

fetch(res, n = -1, ...)

Description

Fetch the next n elements (rows) from the result set and return them as a data.frame.

Arguments

res

An object inheriting from DBIResult, created by dbSendQuery().

n

maximum number of records to retrieve per fetch. Use n = -1 or n = Inf to retrieve all pending records. Some implementations may recognize other special values.

...

Other arguments passed on to methods.

Details

fetch() is provided for compatibility with older DBI clients - for all new code you are strongly encouraged to use dbFetch(). The default implementation for dbFetch() calls fetch() so that it is compatible with existing code. Modern backends should implement for dbFetch() only.

Value

dbFetch() always returns a data.frame with as many rows as records were fetched and as many columns as fields in the result set, even if the result is a single value or has one or zero rows. An attempt to fetch from a closed result set raises an error. If the n argument is not an atomic whole number greater or equal to -1 or Inf, an error is raised, but a subsequent call to dbFetch() with proper n argument succeeds. Calling dbFetch() on a result set from a data manipulation query created by dbSendStatement() can be fetched and return an empty data frame, with a warning.

Specification

Fetching multi-row queries with one or more columns be default returns the entire result. Multi-row queries can also be fetched progressively by passing a whole number (integer or numeric) as the n argument. A value of Inf for the n argument is supported and also returns the full result. If more rows than available are fetched, the result is returned in full without warning. If fewer rows than requested are returned, further fetches will return a data frame with zero rows. If zero rows are fetched, the columns of the data frame are still fully typed. Fetching fewer rows than available is permitted, no warning is issued when clearing the result set.

A column named row_names is treated like any other column.

The column types of the returned data frame depend on the data returned:

  • integer for integer values between -2^31 and 2^31 - 1

  • numeric for numbers with a fractional component

  • logical for Boolean values (some backends may return an integer)

  • character for text

  • lists of raw for blobs (with NULL entries for SQL NULL values)

  • coercible using as.Date() for dates (also applies to the return value of the SQL function current_date)

  • coercible using hms::as.hms() for times (also applies to the return value of the SQL function current_time)

  • coercible using as.POSIXct() for timestamps (also applies to the return value of the SQL function current_timestamp)

  • NA for SQL NULL values

If dates and timestamps are supported by the backend, the following R types are used:

  • Date for dates (also applies to the return value of the SQL function current_date)

  • POSIXct for timestamps (also applies to the return value of the SQL function current_timestamp)

R has no built-in type with lossless support for the full range of 64-bit or larger integers. If 64-bit integers are returned from a query, the following rules apply:

  • Values are returned in a container with support for the full range of valid 64-bit values (such as the integer64 class of the bit64 package)

  • Coercion to numeric always returns a number that is as close as possible to the true value

  • Loss of precision when converting to numeric gives a warning

  • Conversion to character always returns a lossless decimal representation of the data

Examples

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "mtcars", mtcars)

# Fetch all results
rs <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
dbFetch(rs)
dbClearResult(rs)

# Fetch in chunks
rs <- dbSendQuery(con, "SELECT * FROM mtcars")
while (!dbHasCompleted(rs)) {
  chunk <- dbFetch(rs, 10)
  print(nrow(chunk))
}

dbClearResult(rs)
dbDisconnect(con)

Clear a result set

This section describes the behavior of the following method:

dbClearResult(res, ...)

Description

Frees all resources (local and remote) associated with a result set. In some cases (e.g., very large result sets) this can be a critical step to avoid exhausting resources (memory, file descriptors, etc.)

Arguments

res

An object inheriting from DBIResult.

...

Other arguments passed on to methods.

Value

dbClearResult() returns TRUE, invisibly, for result sets obtained from both dbSendQuery() and dbSendStatement(). An attempt to close an already closed result set issues a warning in both cases.

Specification

dbClearResult() frees all resources associated with retrieving the result of a query or update operation. The DBI backend can expect a call to dbClearResult() for each dbSendQuery() or dbSendStatement() call.

Examples

con <- dbConnect(RSQLite::SQLite(), ":memory:")

rs <- dbSendQuery(con, "SELECT 1")
print(dbFetch(rs))

dbClearResult(rs)
dbDisconnect(con)

Bind values to a parameterized/prepared statement

This section describes the behavior of the following method:

dbBind(res, params, ...)

Description

For parametrized or prepared statements, the dbSendQuery() and dbSendStatement() functions can be called with statements that contain placeholders for values. The dbBind() function binds these placeholders to actual values, and is intended to be called on the result set before calling dbFetch() or dbGetRowsAffected().

Arguments

res

An object inheriting from DBIResult.

params

A list of bindings, named or unnamed.

...

Other arguments passed on to methods.

Details

DBI supports parametrized (or prepared) queries and statements via the dbBind() generic. Parametrized queries are different from normal queries in that they allow an arbitrary number of placeholders, which are later substituted by actual values. Parametrized queries (and statements) serve two p