Plotting Posterior Distributions with ggdistribute

Joseph M. Burling

2018-11-15

Introduction

The ggdistribute package is an extension for plotting posterior or other types of unimodal distributions that require overlaying information about a distribution’s intervals. It makes use of the ggproto system to extend ggplot2, providing additional “geoms”, “stats”, and “positions.” The extensions integrate with existing ggplot2 layer elements.

To load the package to access the exported ggplot2 extensions, do the following,

library(ggdistribute)
#> ggdistribute loaded

The data object below is a randomly generated dataset of 5 different normal distributions. Two factors Condition and Group are assigned according to the generated values. 1000 samples are generated for each value of mu.

data <- data_normal_sample(mu = c(-1, 0, 2, 5, 10), n = 1000)

Extensions passed to ggplot2::layer

geom_posterior(...), GeomPosterior

This is the main function wrapper to generate the posterior distribution grobs. See help("geom_posterior") for a list of options passed to the ggproto object GeomPosterior. The geom_posterior() function with no arguments and no y aesthetic defaults to estimating the normalized density (integerates to 1) times the number of data points. This is the same as aes(y = ..count..) with geom_posterior.

If y is discrete, the densities are justified at the bottom of the y value.

Mirroring can be turned on to generate densities similar to geom_violin, and geom_posterior can be used like other geoms in ggplot2, like specifying facetting.

stat_density_ci(...), StatDensityCI

The StatDensityCI class is the default stat for the geom_posterior wrapper. It computes densities for each group and finds the confidence intervals. See help("stat_density_ci") for additional options and computed variables.

The stat_density_ci wrapper can also be used with other geoms that make use of density, count, and scaled variables.

position_spread(...), PositionSpread

The PositionSpread class spreads out overlapping densities within the range of their y axis value. For instance, there are two different groups for Condition A-D, but only one group in E. Distributions within each Condition are resized and spread over the group’s y interval. Padding is turned off below to see where distributions begin and end.

Extended example

The package function example_plot() is an overview of combining ggdistribute with other ggplot2 elements. The contents of this function are printed below and gives details about the extended parts to ggplot2.

# color palette
colors <- mejr_palette()

plt <- 
  ggplot(sre_data(5000), 
  aes(y=effect)) +
  # ggdistribute specific elements -------------------------------------------
  geom_posterior(
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # geom_posterior() aesthetics mappings
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    aes(x=value, fill=contrast),
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # options passed to stat_density_ci() for estimating intervals
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    interp_thresh=.001, # threshold for interpolating segment gaps
    center_stat="median", # measure of central tendency
    ci_width=0.90, # width corresponding to CI segments
    interval_type="ci", # quantile intervals not highest density interval
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # options passed to stat_density_ci() for estimating density
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    bw=".nrd0", # bandwidth estimator type
    adjust=1.5, # adjustment to bandwidth
    n=1024, # number of samples in final density
    trim=.005, # trim `x` this proportion before estimating density
    cut=1.5, # tail extension for zero density estimation
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # geom_posterior() options
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    draw_ci=TRUE, # toggle showing confidence interval parts
    draw_sd=TRUE, # toggle showing standard deviation parts
    mirror=FALSE, # toggle horizontal violin distributions
    midline=NULL, # line displaying center of dist. (NULL=aes color)
    brighten=c(3, 0, 1.333), # additive adjustment of segment fill colors
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # position_spread() options
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    position=position_spread(
      reverse=TRUE, # order of spreaded groups within panels
      padding=0.3, # shrink heights of distributions
      height="panel" # scale by heights within panels
    ), #
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # standard ggplot layer options
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    size=0.15, color=colors$gray, vjust=0.7, show.legend=FALSE
  ) +
  # standard ggplot2 elements ------------------------------------------------
  geom_vline(alpha=0.5, color=colors$gray, size=0.333, linetype=1, xintercept=0) +
  scale_x_continuous(breaks=seq(-1, 1, .05)) +
  facet_grid("contrast ~ .", scales="free_y", space="free_y") +
  scale_fill_manual(values=c(colors$yellow, colors$magenta, colors$cyan)) +
  labs(x="Difference in accuracy (posterior predictions)") +
  theme(
    legend.position="none", strip.text.y=element_text(angle=0, hjust=0.5),
    panel.border=element_rect(fill=NA, color=colors$lightgray, size=0.67),
    panel.grid=element_blank(), panel.ontop=FALSE, axis.title.y=element_blank(),
    plot.margin=margin(t=2, r=4, b=2, l=2, unit="pt")
  )

plot(plt)