Step by step guide for creating a package that depends on RStan

Stefan Siegert, Jonah Gabry, and Ben Goodrich

2018-08-20

Introduction

In this vignette we will walk through the steps necessary for creating an R package that depends on Stan by creating a package with one function that fits a simple linear regression. Before continuing, we recommend that you first read the other vignette Guidelines for Developers of R Packages Interfacing with Stan.

Creating the package skeleton

To start off, we use rstan_package_skeleton to initialise a bare-bones package directory. The name of our demo package will be rstanlm; it will fit a simple linear regression model using Stan.

library("rstantools")
rstan_package_skeleton(path = 'rstanlm')
This is rstantools version 1.5.1
Creating package skeleton for package: rstanlm
Running usethis::create_package ...
✔ Setting active project to '/private/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpScUbme/rstanlm'
✔ Creating 'R/'
✔ Creating 'man/'
✔ Writing 'DESCRIPTION'
✔ Writing 'NAMESPACE'
✔ Creating 'tools/'
✔ Creating 'src/'
✔ Creating 'src/stan_files/'
✔ Creating 'src/stan_files/chunks/'
✔ Creating 'inst/'
✔ Creating 'inst/include/'
Updating R directory ...
Adding .travis.yml file ...
Updating DESCRIPTION with necessary dependencies ...
Updating NAMESPACE ...
Writing NAMESPACE
Writing rstanlm-package.Rd
Writing Read-and-delete-me file with additional instructions ...

Finished skeleton for package: rstanlm
Further steps are described in '/private/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpScUbme/rstanlm/Read-and-delete-me'.

If we had existing .stan files to include with the package we could use the optional stan_files argument to rstan_package_skeleton to include them. Another option, which we’ll use below, is to add the Stan files once the basic structure of the package is in place.

We can now set the new working directory to the new package directory and view the contents. (Note: if using RStudio then by default the newly created project for the package will be opened in a new session and you will not need the call to setwd().)

setwd("rstanlm")
list.files(all.files = TRUE)
 [1] "."                  ".."                 ".Rbuildignore"     
 [4] ".travis.yml"        "DESCRIPTION"        "NAMESPACE"         
 [7] "R"                  "Read-and-delete-me" "inst"              
[10] "man"                "src"                "tools"             
file.show("DESCRIPTION")
Package: rstanlm
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R: 
    person(given = "First",
           family = "Last",
           role = c("aut", "cre"),
           email = "first.last@example.com")
Description: What the package does (one paragraph).
License: GPL (>=3)
Depends: 
    R (>= 3.4.0),
    Rcpp (>= 0.12.18),
    methods
Imports: 
    rstan (>= 2.17.3),
    rstantools (>= 1.5.0)
LinkingTo: 
    BH (>= 1.66.0-1),
    Rcpp (>= 0.12.18),
    RcppEigen (>= 0.3.3.4.0),
    StanHeaders (>= 2.17.2),
    rstan (>= 2.17.3)
Encoding: UTF-8
LazyData: true
NeedsCompilation: yes
SystemRequirements: GNU make
RoxygenNote: 6.0.1

Some of the sections in the DESCRIPTION file need to be edited by hand (Title, Author, Maintainer, and Description), but rstan_package_skeleton() has added the necessary packages and versions to Depends, Imports, and LinkingTo. It also added the SystemRequirements and NeedsCompilation fields.

Read-and-delete-me file

Before deleting the Read-and-delete-me file in the new package directory make sure to read it because it contains some important instructions about customizing your package:

file.show("Read-and-delete-me")
* The precompiled stanmodel objects will appear in a named list called 'stanmodels', 
and you can call them with something like rstan::sampling(stanmodels$foo, ...)
* You can put into src/stan_files/chunks any file that is needed by any .stan file in src/stan_files, 
* You can put into inst/include any C++ files that are needed by any .stan file in src/stan_files, 
but be sure to #include your C++ files in inst/include/meta_header.hpp
* While developing your package use devtools::install('.', local=FALSE) 
to reinstall the package AND recompile Stan programs, or set local=FALSE to skip the recompilation.

You can move this file out of the directory, delete it, or list it in the .Rbuildignore file if you want to keep it in the directory.

file.remove('Read-and-delete-me')
[1] TRUE

Stan files

Our package will call RStan’s sampling method to use MCMC to fit a simple linear regression model for an outcome variable y with a single predictor x. After writing the necessary Stan program, the file should be saved with a .stan extension in the src/stan_files/ subdirectory. We’ll save the following program to src/stan_files/lm.stan:

// Save this file as src/stan_files/lm.stan
data {
  int<lower=1> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real intercept;
  real beta;
  real<lower=0> sigma;
}
model {
  // ... priors, etc.
  
  y ~ normal(intercept + beta * x, sigma);
}

The src/stan_files subdirectory can contain additional Stan programs if required by your package. During installation, all Stan programs will be compiled and saved in the list stanmodels that can then be used by R function in the package. The rule is that the Stan program compiled from the model code in src/stan_files/foo.stan is stored as list element stanmodels$foo.

R files

We next create the file R/lm_stan.R where we define the function lm_stan in which our compiled Stan model is being used. A comment block in roxygen2 syntax ensures that the function has a help file and that it is added to the NAMESPACE:

# Save this file as `R/lm_stan.R`

#' Bayesian linear regression with Stan
#'
#' @export
#' @param x Numeric vector of input values.
#' @param y Numberic vector of output values.
#' @param ... Arguments passed to `rstan::sampling` (e.g. iter, chains).
#' @return An object of class `stanfit` returned by `rstan::sampling`
#'
lm_stan <- function(x, y, ...) {
  standata <- list(x = x, y = y, N = length(y))
  out <- rstan::sampling(stanmodels$lm, data = standata, ...)
  return(out)
}

The top-level package file R/rstanlm-package.R has already been created by rstan_package_skeleton() but needs to be modified to decribe the functionality of our package:

file.show(file.path("R", "rstanlm-package.R"))
#' The 'rstanlm' package.
#' 
#' @description A DESCRIPTION OF THE PACKAGE
#' 
#' @docType package
#' @name rstanlm-package
#' @aliases rstanlm
#' @useDynLib rstanlm, .registration = TRUE
#' @import methods
#' @import Rcpp
#' @import rstantools
#' @importFrom rstan sampling
#' 
#' @references 
#' Stan Development Team (2018). RStan: the R interface to Stan. R package version 2.17.3. http://mc-stan.org
#' 
NULL

The description section needs to be manually edited but the necessary roxygen2 tags for specifying important parts of the NAMESPACE file have already been included.

Documentation

To update the NAMESPACE file and the rest of the documentation to include lm_stan we need to regenerate the documentation using roxygen2::roxygenise (or devtools::document):

roxygen2::roxygenise(clean=TRUE)
Writing NAMESPACE
Writing lm_stan.Rd
Writing rstanlm-package.Rd

Install and use

Finally, the package can be installed:

devtools::install(local=FALSE)
Loading rstanlm
Re-compiling rstanlm
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
  --no-environ --no-save --no-restore --quiet CMD INSTALL  \
  '/private/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpScUbme/rstanlm'  \
  --library='/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//RtmpScUbme/devtools_install_70877090a2b1'  \
  --no-R --no-data --no-help --no-demo --no-inst --no-docs --no-exec  \
  --no-multiarch --no-test-load --preclean 

The argument local=FALSE is necessary if you want to recompile the Stan models. If you only made a small change to the R code or the documentation, you can set local=TRUE to speed up the process.

The package can now be loaded and used like any other R package:

library("rstanlm")
fit <- lm_stan(y = rnorm(10), x = rnorm(10), iter = 500)

SAMPLING FOR MODEL 'lm' NOW (CHAIN 1).

Gradient evaluation took 1.3e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 500 [  0%]  (Warmup)
Iteration:  50 / 500 [ 10%]  (Warmup)
Iteration: 100 / 500 [ 20%]  (Warmup)
Iteration: 150 / 500 [ 30%]  (Warmup)
Iteration: 200 / 500 [ 40%]  (Warmup)
Iteration: 250 / 500 [ 50%]  (Warmup)
Iteration: 251 / 500 [ 50%]  (Sampling)
Iteration: 300 / 500 [ 60%]  (Sampling)
Iteration: 350 / 500 [ 70%]  (Sampling)
Iteration: 400 / 500 [ 80%]  (Sampling)
Iteration: 450 / 500 [ 90%]  (Sampling)
Iteration: 500 / 500 [100%]  (Sampling)

 Elapsed Time: 0.007888 seconds (Warm-up)
               0.006629 seconds (Sampling)
               0.014517 seconds (Total)


SAMPLING FOR MODEL 'lm' NOW (CHAIN 2).

Gradient evaluation took 4e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 500 [  0%]  (Warmup)
Iteration:  50 / 500 [ 10%]  (Warmup)
Iteration: 100 / 500 [ 20%]  (Warmup)
Iteration: 150 / 500 [ 30%]  (Warmup)
Iteration: 200 / 500 [ 40%]  (Warmup)
Iteration: 250 / 500 [ 50%]  (Warmup)
Iteration: 251 / 500 [ 50%]  (Sampling)
Iteration: 300 / 500 [ 60%]  (Sampling)
Iteration: 350 / 500 [ 70%]  (Sampling)
Iteration: 400 / 500 [ 80%]  (Sampling)
Iteration: 450 / 500 [ 90%]  (Sampling)
Iteration: 500 / 500 [100%]  (Sampling)

 Elapsed Time: 0.00673 seconds (Warm-up)
               0.005018 seconds (Sampling)
               0.011748 seconds (Total)


SAMPLING FOR MODEL 'lm' NOW (CHAIN 3).

Gradient evaluation took 4e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 500 [  0%]  (Warmup)
Iteration:  50 / 500 [ 10%]  (Warmup)
Iteration: 100 / 500 [ 20%]  (Warmup)
Iteration: 150 / 500 [ 30%]  (Warmup)
Iteration: 200 / 500 [ 40%]  (Warmup)
Iteration: 250 / 500 [ 50%]  (Warmup)
Iteration: 251 / 500 [ 50%]  (Sampling)
Iteration: 300 / 500 [ 60%]  (Sampling)
Iteration: 350 / 500 [ 70%]  (Sampling)
Iteration: 400 / 500 [ 80%]  (Sampling)
Iteration: 450 / 500 [ 90%]  (Sampling)
Iteration: 500 / 500 [100%]  (Sampling)

 Elapsed Time: 0.006052 seconds (Warm-up)
               0.004474 seconds (Sampling)
               0.010526 seconds (Total)


SAMPLING FOR MODEL 'lm' NOW (CHAIN 4).

Gradient evaluation took 7e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 500 [  0%]  (Warmup)
Iteration:  50 / 500 [ 10%]  (Warmup)
Iteration: 100 / 500 [ 20%]  (Warmup)
Iteration: 150 / 500 [ 30%]  (Warmup)
Iteration: 200 / 500 [ 40%]  (Warmup)
Iteration: 250 / 500 [ 50%]  (Warmup)
Iteration: 251 / 500 [ 50%]  (Sampling)
Iteration: 300 / 500 [ 60%]  (Sampling)
Iteration: 350 / 500 [ 70%]  (Sampling)
Iteration: 400 / 500 [ 80%]  (Sampling)
Iteration: 450 / 500 [ 90%]  (Sampling)
Iteration: 500 / 500 [100%]  (Sampling)

 Elapsed Time: 0.007713 seconds (Warm-up)
               0.005446 seconds (Sampling)
               0.013159 seconds (Total)
print(fit)
Inference for Stan model: lm.
4 chains, each with iter=500; warmup=250; thin=1; 
post-warmup draws per chain=250, total post-warmup draws=1000.

           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
intercept  0.57    0.02 0.43 -0.26  0.33  0.56  0.80  1.50   378 1.01
beta       0.18    0.03 0.67 -1.19 -0.23  0.19  0.60  1.60   469 1.00
sigma      1.04    0.01 0.31  0.63  0.83  0.98  1.18  1.85   470 1.00
lp__      -4.64    0.07 1.39 -8.32 -5.33 -4.25 -3.60 -3.04   352 1.00

Samples were drawn using NUTS(diag_e) at Mon Aug 20 19:21:53 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).