Best practices for writing an API package

This document collects best practices for writing a package that connects to an web API. The goal is to help you produce a package that is safe, secure and keeps working in the long run.

If you're new to working with web APIs, start by reading “An introduction to APIs” by zapier.

Key info

When writing an API, it's best to start with some helper functions that capture the common conventions of the API in one place. These functions capture information like:

The following example shows how you might write these functions for the github API. Note that I've used lots of small function to avoid repeating code as much as possible. This is important when writing code that talks to APIs because APIs change all too frequently and you only want to have to change important facts in one place.

We start with functions to execute GET and POST requests:

github_GET <- function(path, ..., pat = github_pat()) {
  auth <- github_auth(pat)
  req <- GET("https://api.github.com/", path = path, auth, ...)
  github_check(req)

  req
}

github_POST <- function(path, body, ..., pat = github_pat()) {
  auth <- github_auth(pat)

  stopifnot(is.list(body))
  body_json <- jsonlite::toJSON(body)

  req <- POST("https://api.github.com/", path = path, body = body_json,
    auth, post, ...)
  github_check(req)

  req
}

These need some additional infrastructure to authenticate, check the responses and give useful error messages and parse responses:

github_auth <- function(pat = github_pat()) {
  authenticate(pat, "x-oauth-basic", "basic")
}

github_check <- function(req) {
  if (req$status_code < 400) return(invisible())

  message <- github_parse(req)$message
  stop("HTTP failure: ", req$status_code, "\n", message, call. = FALSE)
}

github_parse <- function(req) {
  text <- content(req, as = "text")
  if (identical(text, "")) stop("Not output to parse", call. = FALSE)
  jsonlite::fromJSON(text, simplifyVector = FALSE)
}

github_pat <- function() {
  Sys.getenv('GITHUB_PAT')
}

has_pat <- function() !identical(github_pat(), "")

github_pat() is just a shim, to get my personal access token from an environment variable. Later, you'll see a better, if lengthier, way of writing it in the authentication section.

Once you have these pieces in place, it's simple to implement API functions. For example, we could implement a rate_limit() function that tells you how many calls against the github API are available to you.

rate_limit <- function() {
  req <- github_GET("rate_limit")
  github_parse(req)
}

if (has_pat()) {
  str(rate_limit())
}
#> List of 2
#>  $ resources:List of 2
#>   ..$ core  :List of 3
#>   .. ..$ limit    : int 5000
#>   .. ..$ remaining: int 5000
#>   .. ..$ reset    : int 1418484931
#>   ..$ search:List of 3
#>   .. ..$ limit    : int 30
#>   .. ..$ remaining: int 30
#>   .. ..$ reset    : int 1418481391
#>  $ rate     :List of 3
#>   ..$ limit    : int 5000
#>   ..$ remaining: int 5000
#>   ..$ reset    : int 1418484931

After getting the first version getting working, you'll often want to polish the output to be more user friendly. For this example, we can parse the unix timestamps into more useful date types.

rate_limit <- function() {
  req <- github_GET("rate_limit")
  core <- github_parse(req)$resources$core

  reset <- as.POSIXct(core$reset, origin = "1970-01-01")
  cat(core$remaining, " / ", core$limit,
    " (Reset ", strftime(reset, "%H:%M:%S"), ")\n", sep = "")
}

if (has_pat()) {
  rate_limit()
}
#> 5000 / 5000 (Reset 09:35:31)

Depending on the complexity of the API, you might want to keep separate the functions that return a request object and the functions that parse it into a useful R object.

Parsing output and posting input

Most APIs communicate either with json or xml. To work with json, I recommend the jsonlite package. To work with xml, use the xml package.

httr provides some default parsers with content(..., as = "auto") but I don't recommend using them inside a package. Instead get the content as text with content(..., as = "text") and parse it yourself. The API might return invalid data, but this should be rare, so you can just rely on the parser to provide a useful error message.

github_parse <- function(req) {
  text <- content(req, as = "text")
  if (identical(text, "")) stop("")
  jsonlite::fromJSON(text, simplifyVector = FALSE)
}

Many APIs use content negotiation to determine whats sort of data to send back. If the API you're wrapping does this, then you might find accept_json() and accept_xml() to be useful.

Responding to errors

First, check the HTTP status code. Status codes in the 400 range usually mean that you've done something wrong. Status codes in the 500 range typically mean that something has gone wrong on the server side. This however, might be that you sent the server something badly formed.

When you get an error, often the body of the request will contain some useful information, so you should parse it and pull out the error. This will vary based on the API. For github, we can parse it as follows:

github_parse <- function(req) {
  text <- content(req, as = "text")
  if (identical(text, "")) stop("Not output to parse", call. = FALSE)
  jsonlite::fromJSON(text, simplifyVector = FALSE)
}

If the API returns special errors for common problems, you might want to provide more detail in the error. For example, if you run out of requests and are rate limited you might want to tell the user how long to wait until they can make the next request (or even automatically wait that long!).

Authentication

The most common forms of authentication are OAuth and http basic auth:

You also need some way to preserve user credentials so that they don't need to be re-entered multiple times. If you use OAuth, httr will take care of this for. For other use cases, I recommend using environment variables. The following function retrieves your PAT from an environmental variable called GITHUB_PAT, telling you how it set it if not. The devtools package needs to access your github personal access token to install packages from private repos.

github_pat <- function(force = FALSE) {
  env <- Sys.getenv('GITHUB_PAT')
  if (!identical(env, "") && !force) return(env)

  if (!interactive()) {
    stop("Please set env var GITHUB_PAT to your github personal access token",
      call. = FALSE)
  }

  message("Couldn't find env var GITHUB_PAT. See ?github_pat for more details.")
  message("Please enter your PAT and press enter:")
  pat <- readline(": ")

  if (identical(pat, "")) {
    stop("Github personal access token entry failed", call. = FALSE)
  }

  message("Updating GITHUB_PAT env var to PAT")
  Sys.setenv(GITHUB_PAT = pat)

  pat
}

Encourage your users to store their important information once (see below), rather than typing it into the console - it's easy to accidentally publish your .Rhistory and you don't want it to contain private data.

Appendix: API key best practices

If your package supports an authentication workflow based on an API key or token, encourage users to store it in an environment variable. We illustrate this using the github R package, which wraps the Github v3 API. Tailor this template to your API + package and include in README.md or a vignette.

  1. Create a personal access token in the Applications area of your GitHub personal settings. Copy token to the clipboard.

  2. Identify your home directory. Not sure? Enter normalizePath("~/") in the R console.

  3. Create a new text file. If in RStudio, do File > New File > Text file.

  4. Create a line like this:

    GITHUB_TOKEN=blahblahblahblahblahblah
    

    where the name GITHUB_TOKEN reminds you which API this is for and blahblahblahblahblahblah is your token, pasted from the clipboard.

  5. Make sure the last line in the file is empty (if it isn't R will silently fail to load the file. If you're using an editor that shows line numbers, there should be two lines, where the second one is empty.

  6. Save in your home directory with the filename .Renviron. If questioned, YES you do want to use a filename that begins with a dot ..

    Note that by default dotfiles are usually hidden. But within RStudio, the file browser will make .Renviron visible and therefore easy to edit in the future.

  7. Restart R. .Renviron is processed only at the start of an R session.

  8. Use Sys.getenv() to access your token. For example, here's how to use your GITHUB_TOKEN with the github package:

    library(github)
    ctx <- create.github.context(access_token = Sys.getenv("GITHUB_TOKEN"))
    # ... proceed to use other package functions to open issues, etc.
    

FAQ: Why define this environment variable via .Renviron instead of in .bash_profile or .bashrc?

Because there are many combinations of OS and ways of running R where the .Renviron approach “just works” and the bash stuff does not. When R is a child process of, say, Emacs or RStudio, you can't always count on environment variables being passed to R. Put them in an R-specific start-up file and save yourself some grief.