MCMC output format

2021-09-24

In this tutorial, we will have a closer look at the object returned by calling run_mcmc() on a network model. Understanding the structure of the object containing the results of a run is important for model diagnostics and interpretation.

The default format: mcmc.list

To quickly obtain an MCMC run output for us to examine, let's run the simple model aquarium_mod which is provided with the package. Feel free to read the help ?aquarium_mod if you are curious about the model itself.

library(isotracer)
aquarium_mod
fit <- run_mcmc(aquarium_mod, iter = 1000)

By default, run_mcmc() returns an mcmc.list object. An mcmc.list has a simple format to store the content of parallel MCMC chains:

length(fit)
## [1] 4
str(fit[[1]])
##  'mcmc' num [1:500, 1:8] 0.151 0.122 0.098 0.118 0.11 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:8] "eta" "lambda_algae" "lambda_daphnia" "lambda_NH4" ...
##  - attr(*, "mcpar")= num [1:3] 501 1000 1

The output above can seem a little obscure if you are not familiar with R data structures, but in a nutshell it tells us that the mcmc.list is basically a list with one element per chain, each chain being stored as a matrix.

The mcmc.list class is implemented by the coda package, and it has the advantage of being recognized by many other R packages dealing with Bayesian MCMC such as bayesplot of ggmcmc.

In the isotracer package, the returned fit is very slightly extended compared to the base mcmc.list class:

class(fit)
## [1] "networkModelStanfit" "mcmc.list"

By having also a networkModelStanfit class, the output from run_mcmc() can be recognized automatically by some methods implemented in isotracer, such as plot():

plot(fit)
# Note: the figure below only shows a few of the traceplots for vignette concision

plot of chunk unnamed-chunk-7

How to convert the default output to a table?

An mcmc.list object can be converted to an even simpler, flat matrix:

z <- as.matrix(fit)
head(z)
##            eta lambda_algae lambda_daphnia  lambda_NH4 upsilon_algae_to_daphnia
## [1,] 0.1508579   0.03651300    0.002671921 0.039642685               0.06069528
## [2,] 0.1221929   0.06116589    0.004776632 0.043504167               0.04786831
## [3,] 0.0979891   0.05001424    0.021092926 0.007698263               0.09994864
## [4,] 0.1183222   0.14622338    0.005098195 0.169566673               0.08385899
## [5,] 0.1101955   0.01013583    0.011984862 0.005719459               0.07144323
## [6,] 0.1689449   0.07063875    0.016750328 0.013090974               0.07063488
##      upsilon_daphnia_to_NH4 upsilon_NH4_to_algae      zeta
## [1,]             0.05643808            0.3594141 0.2159355
## [2,]             0.04523513            0.3791117 0.2964442
## [3,]             0.05856778            0.2990949 0.1954708
## [4,]             0.05060606            0.3191617 0.2549999
## [5,]             0.05789025            0.3490589 0.2968203
## [6,]             0.03805178            0.2727921 0.1820191
str(z)
##  num [1:2000, 1:8] 0.151 0.122 0.098 0.118 0.11 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:8] "eta" "lambda_algae" "lambda_daphnia" "lambda_NH4" ...

or to a data frame:

z <- as.data.frame(as.matrix(fit))
head(z)
##         eta lambda_algae lambda_daphnia  lambda_NH4 upsilon_algae_to_daphnia
## 1 0.1508579   0.03651300    0.002671921 0.039642685               0.06069528
## 2 0.1221929   0.06116589    0.004776632 0.043504167               0.04786831
## 3 0.0979891   0.05001424    0.021092926 0.007698263               0.09994864
## 4 0.1183222   0.14622338    0.005098195 0.169566673               0.08385899
## 5 0.1101955   0.01013583    0.011984862 0.005719459               0.07144323
## 6 0.1689449   0.07063875    0.016750328 0.013090974               0.07063488
##   upsilon_daphnia_to_NH4 upsilon_NH4_to_algae      zeta
## 1             0.05643808            0.3594141 0.2159355
## 2             0.04523513            0.3791117 0.2964442
## 3             0.05856778            0.2990949 0.1954708
## 4             0.05060606            0.3191617 0.2549999
## 5             0.05789025            0.3490589 0.2968203
## 6             0.03805178            0.2727921 0.1820191
str(z)
## 'data.frame':    2000 obs. of  8 variables:
##  $ eta                     : num  0.151 0.122 0.098 0.118 0.11 ...
##  $ lambda_algae            : num  0.0365 0.0612 0.05 0.1462 0.0101 ...
##  $ lambda_daphnia          : num  0.00267 0.00478 0.02109 0.0051 0.01198 ...
##  $ lambda_NH4              : num  0.03964 0.0435 0.0077 0.16957 0.00572 ...
##  $ upsilon_algae_to_daphnia: num  0.0607 0.0479 0.0999 0.0839 0.0714 ...
##  $ upsilon_daphnia_to_NH4  : num  0.0564 0.0452 0.0586 0.0506 0.0579 ...
##  $ upsilon_NH4_to_algae    : num  0.359 0.379 0.299 0.319 0.349 ...
##  $ zeta                    : num  0.216 0.296 0.195 0.255 0.297 ...

or to a tibble:

z <- tibble::as_tibble(as.matrix(fit))
z
## # A tibble: 2,000 × 8
##       eta lambda_algae lambda_daphnia lambda_NH4 upsilon_algae_to_… upsilon_daphnia_…
##     <dbl>        <dbl>          <dbl>      <dbl>              <dbl>             <dbl>
##  1 0.151        0.0365        0.00267    0.0396              0.0607            0.0564
##  2 0.122        0.0612        0.00478    0.0435              0.0479            0.0452
##  3 0.0980       0.0500        0.0211     0.00770             0.0999            0.0586
##  4 0.118        0.146         0.00510    0.170               0.0839            0.0506
##  5 0.110        0.0101        0.0120     0.00572             0.0714            0.0579
##  6 0.169        0.0706        0.0168     0.0131              0.0706            0.0381
##  7 0.165        0.0215        0.0186     0.00253             0.156             0.0571
##  8 0.207        0.0155        0.0145     0.0610              0.152             0.0505
##  9 0.0780       0.0417        0.00443    0.0288              0.0634            0.0484
## 10 0.0905       0.0486        0.0598     0.103               0.0657            0.0484
## # … with 1,990 more rows, and 2 more variables: upsilon_NH4_to_algae <dbl>,
## #   zeta <dbl>

How to calculate derived parameters?

Converting your output to one of those simple tabular formats can be useful if you want to manipulate and perform operations on your MCMC samples.

However, for simple manipulations, isotracer provides convenient methods to perform calculations on parameter chains directly from the output of run_mcmc(). You can thus produce derived parameter chains directly from the mcmc.list object, without having to convert your output to another format:

algal_total_out <- fit[, "upsilon_algae_to_daphnia"] + fit[, "lambda_algae"]
algal_turnover <- 1 / algal_total_out
plot(algal_turnover)

plot of chunk unnamed-chunk-11

You can read more about this in the vignette about calculating derived parameters.

How to combine derived parameters?

You can combine derived parameters into a single mcmc.listobject using the usual c() syntax. This can be convenient for more compact plotting or summary calculations:

my_derived <- c("out rate" = algal_total_out, "turnover" = algal_turnover)
plot(my_derived)

plot of chunk unnamed-chunk-12

summary(my_derived)
## 
## Iterations = 501:1000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 500 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##            Mean      SD Naive SE Time-series SE
## out rate 0.1428 0.04711 0.001053       0.001286
## turnover 7.8677 2.89297 0.064689       0.090681
## 
## 2. Quantiles for each variable:
## 
##             2.5%    25%    50%    75%   97.5%
## out rate 0.06775 0.1071 0.1386 0.1738  0.2462
## turnover 4.06253 5.7544 7.2146 9.3359 14.7599

A more detailed format: stanfit

Calling run_mcmc() will run a Stan model behind the scenes. Stan is great since it will let you know loudly when something went wrong with the run, such as problems with divergent chains or low Bayesian fraction of missing information. Such problems should not be ignored! The Stan development team has a nice page explaining Stan's warnings.

In any case, if something went wrong with your run, you might want to have a more complete output than simply the mcmc.list object. You can ask run_mcmc() to return the original stanfit object produced by Stan with:

fit2 <- run_mcmc(aquarium_mod, iter = 1000, stanfit = TRUE)

fit2 is now a regular stanfit object:

class(fit2)
## [1] "stanfit"

This is a more complicated type of object than an mcmc.list, but it also contains much more information about the Stan run. It also comes with the benefit of the existing methods for stanfit object, for example:

rstan::plot(fit2)

plot of chunk unnamed-chunk-16

You can go through Stan documentation for more details about this format. If you are reading about solving Stan model issues on online forums and the suggested solutions require to examine some Stan output, that's the object you want to look at!

For example, you can examine it with ShinyStan:

library(shinystan)
launch_shinystan(fit2)