Labeling small strata

Jason Cory Brunson

2018-10-21

Setup

This brief vignette uses the vaccinations dataset included in ggalluvial. As in the technical introduction, the order of the levels is reversed to be more intuitive. Objects from other ggplot2 extensions are accessed via :: and :::.

knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.align = "center")
library(ggalluvial)
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))

Problem

The issue on the table: Strata are most helpful when they’re overlaid with text labels. Yet the strata often vary in height, and the labels in length, to such a degree that fitting the text inside the strata at a uniform size renders them illegible. In principle, the user could treat size as a variable aesthetic and manually fit text to strata, but this is cumbersome, and doesn’t help anyway in cases where large text is needed. Fortunately, two wonderful packages provide more elegant solutions.

To illustrate the problem, check out the diagram below. It’s by no means an egregious case, but it’ll do. (For a more practical example, see this question on StackOverflow, which prompted this vignette.)

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject, y = freq,
           fill = response, label = response)) +
  scale_x_discrete(expand = c(.1, 0)) +
  geom_flow(width = 1/4) +
  geom_stratum(alpha = .5, width = 1/4) +
  geom_text(stat = "stratum", size = 4) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses", "labeled using `geom_text()`")

Solutions

Two ggplot2 extensions are well-suited to this problem: ggrepel and ggfittext. They provide new geom layers that use the output of existing stat layers to situate text: ggrepel::geom_text_repel() takes the same aesthetics as ggplot2::geom_text(), namely x, y, and label. In contrast, ggfittext::geom_fit_text() only specifically requires label but also needs enough information to determine the rectangle that will contain the text. This can be encoded as xmin and xmax or as x and width for the horizontal direction, and as ymin and ymax or as y and height for the vertical direction. Conveniently, ggalluvial::stat_stratum() produces more than enough information for both geoms, including x, xmin, xmax, and their vertical counterparts.

All this can be gleaned from the ggproto objects that construct the layers:

print(ggrepel::GeomTextRepel$required_aes)
## [1] "x"     "y"     "label"
print(ggfittext:::GeomFitText$required_aes)
## [1] "label"
print(ggfittext:::GeomFitText$setup_data)
## <ggproto method>
##   <Wrapper function>
##     function (...) 
## f(...)
## 
##   <Inner function (f)>
##     function (data, params) 
## {
##     if ("width" %in% names(data)) {
##         warning("`width` is now an argument, not an aesthetic, and will be removed in a future version")
##     }
##     if ("height" %in% names(data)) {
##         warning("`height` is now an argument, not an aesthetic, and will be removed in a future version")
##     }
##     if (!(("xmin" %in% names(data) & "xmax" %in% names(data)) | 
##         ("x" %in% names(data)))) {
##         stop("geom_fit_text needs either 'xmin' and 'xmax', or 'x'", 
##             .call = FALSE)
##     }
##     if (!("ymin" %in% names(data) & "ymax" %in% names(data) | 
##         "y" %in% names(data))) {
##         stop("geom_fit_text needs either 'ymin' and 'ymax', or 'y'", 
##             .call = FALSE)
##     }
##     if (is.null(data$xmin) & is.null(data$xmax) & class(params$width) != 
##         "unit") {
##         data$xmin <- data$x - params$width/2
##         data$xmax <- data$x + params$width/2
##     }
##     if (is.null(data$ymin) & is.null(data$ymax) & class(params$height) != 
##         "unit") {
##         data$ymin <- data$y - params$height/2
##         data$ymax <- data$y + params$height/2
##     }
##     data
## }
print(StatStratum$compute_panel)
## <ggproto method>
##   <Wrapper function>
##     function (...) 
## f(..., self = self)
## 
##   <Inner function (f)>
##     function (self, data, scales, decreasing = NA, reverse = TRUE, 
##     discern = FALSE, label.strata = FALSE) 
## {
##     if (label.strata) {
##         if (is.null(data$label)) {
##             data$label <- data$stratum
##         }
##         else {
##             warning("Aesthetic `label` is specified, ", "so parameter `label.strata` will be ignored.")
##         }
##     }
##     data <- subset(data, y != 0)
##     data <- auto_aggregate(data = data, by = c("x", "stratum"))
##     data <- if (is.na(decreasing)) {
##         arr_fun <- if (reverse) 
##             dplyr::desc
##         else identity
##         data[with(data, order(PANEL, x, arr_fun(stratum))), , 
##             drop = FALSE]
##     }
##     else {
##         arr_fun <- if (decreasing) 
##             dplyr::desc
##         else identity
##         data[with(data, order(PANEL, x, arr_fun(y))), , drop = FALSE]
##     }
##     data$ycum <- NA
##     for (xx in unique(data$x)) {
##         ww <- which(data$x == xx)
##         data$ycum[ww] <- cumsum(data$y[ww]) - data$y[ww]/2
##     }
##     data <- transform(data, ymin = ycum - y/2, ymax = ycum + 
##         y/2, y = ycum)
##     data$ycum <- NULL
##     data
## }

I reached the specific solutions through trial and error. They may not be the best tricks for most cases, but they demonstrate what these packages can do. For many more examples, see the respective package vignettes: for ggrepel, and for ggfittext.

Solution 1: ggrepel

ggrepel is most often (in my experience) used to repel text away from symbols in a scatterplot, in whatever directions prevent them from overlapping the symbols and each other. In this case, however, it makes much more sense to align them vertically a fixed horizontal distance (nudge_x) away from the strata and repel them vertically from each other (direction = "y") just enough to print them without overlap. It takes an extra bit of effort to render text only for the strata at the first (or at the last) axis, but the result is worth it.

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject, y = freq,
           fill = response)) +
  scale_x_discrete(expand = c(.4, 0)) +
  geom_flow(width = 1/4) +
  geom_stratum(alpha = .5, width = 1/4) +
  scale_linetype_manual(values = c("blank", "solid")) +
  ggrepel::geom_text_repel(
    aes(label = ifelse(as.numeric(survey) == 1, as.character(response), NA)),
    stat = "stratum", size = 4, direction = "y", nudge_x = -.5
  ) +
  ggrepel::geom_text_repel(
    aes(label = ifelse(as.numeric(survey) == 3, as.character(response), NA)),
    stat = "stratum", size = 4, direction = "y", nudge_x = .5
  ) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses", "labeled using `geom_text_repel()`")
## Warning: Removed 8 rows containing missing values (geom_text_repel).

## Warning: Removed 8 rows containing missing values (geom_text_repel).

Solution 2: ggfittext

ggfittext is simplicity itself: The strata are just rectangles, so no more parameter specifications are necessary to fit the text into them. One key parameter is min.size, which defaults to 4 and controls how small the text is allowed to get without being omitted.

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject, y = freq,
           fill = response, label = response)) +
  scale_x_discrete(expand = c(.1, 0)) +
  geom_flow(width = 1/4) +
  geom_stratum(alpha = .5, width = 1/4) +
  ggfittext::geom_fit_text(stat = "stratum", width = 1/4, min.size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses", "labeled using `geom_fit_text()`")

Note that this solution requires ggfittext v0.6.0.

Appendix

sessioninfo::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.3.2 (2016-10-31)
##  os       OS X Mavericks 10.9.5       
##  system   x86_64, darwin13.4.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2018-10-21                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package      * version date       lib source                           
##  assertthat     0.2.0   2017-04-11 [1] CRAN (R 3.3.2)                   
##  backports      1.1.2   2017-12-13 [1] CRAN (R 3.3.2)                   
##  base64enc      0.1-3   2015-07-28 [1] CRAN (R 3.3.0)                   
##  bindr          0.1.1   2018-03-13 [1] CRAN (R 3.3.2)                   
##  bindrcpp     * 0.2.2   2018-03-29 [1] CRAN (R 3.3.2)                   
##  callr          3.0.0   2018-08-24 [1] CRAN (R 3.3.2)                   
##  cli            1.0.1   2018-09-25 [1] CRAN (R 3.3.2)                   
##  colorspace     1.3-2   2016-12-14 [1] CRAN (R 3.3.2)                   
##  commonmark     1.5     2018-04-28 [1] CRAN (R 3.3.2)                   
##  crayon         1.3.4   2017-09-16 [1] CRAN (R 3.3.2)                   
##  curl           3.2     2018-03-28 [1] CRAN (R 3.3.2)                   
##  debugme        1.1.0   2017-10-22 [1] CRAN (R 3.3.2)                   
##  desc           1.2.0   2018-05-01 [1] CRAN (R 3.3.2)                   
##  devtools     * 1.13.6  2018-06-27 [1] CRAN (R 3.3.2)                   
##  digest         0.6.17  2018-09-12 [1] CRAN (R 3.3.2)                   
##  dplyr          0.7.6   2018-06-29 [1] CRAN (R 3.3.2)                   
##  evaluate       0.11    2018-07-17 [1] CRAN (R 3.3.2)                   
##  fs             1.2.6   2018-08-23 [1] CRAN (R 3.3.2)                   
##  ggalluvial   * 0.9.1   2018-10-21 [1] local (corybrunson/ggalluvial@NA)
##  ggfittext      0.6.0   2018-07-06 [1] CRAN (R 3.3.2)                   
##  ggplot2      * 3.0.0   2018-07-03 [1] CRAN (R 3.3.2)                   
##  ggrepel        0.8.0   2018-05-09 [1] CRAN (R 3.3.2)                   
##  glue           1.3.0   2018-07-17 [1] CRAN (R 3.3.2)                   
##  gtable         0.2.0   2016-02-26 [1] CRAN (R 3.3.0)                   
##  htmltools      0.3.6   2017-04-28 [1] CRAN (R 3.3.2)                   
##  httr           1.3.1   2017-08-20 [1] CRAN (R 3.3.2)                   
##  knitr          1.20    2018-02-20 [1] CRAN (R 3.3.2)                   
##  labeling       0.3     2014-08-23 [1] CRAN (R 3.3.0)                   
##  lazyeval       0.2.1   2017-10-29 [1] CRAN (R 3.3.2)                   
##  magrittr       1.5     2014-11-22 [1] CRAN (R 3.3.0)                   
##  MASS           7.3-50  2018-04-30 [1] CRAN (R 3.3.2)                   
##  memoise        1.1.0   2017-04-21 [1] CRAN (R 3.3.2)                   
##  munsell        0.5.0   2018-06-12 [1] CRAN (R 3.3.2)                   
##  pillar         1.3.0   2018-07-14 [1] CRAN (R 3.3.2)                   
##  pkgconfig      2.0.2   2018-08-16 [1] CRAN (R 3.3.2)                   
##  pkgdown      * 1.1.0   2018-06-02 [1] CRAN (R 3.3.2)                   
##  plyr           1.8.4   2016-06-08 [1] CRAN (R 3.3.0)                   
##  processx       3.2.0   2018-08-16 [1] CRAN (R 3.3.2)                   
##  ps             1.1.0   2018-08-10 [1] CRAN (R 3.3.2)                   
##  purrr          0.2.5   2018-05-29 [1] CRAN (R 3.3.2)                   
##  R6             2.2.2   2017-06-17 [1] CRAN (R 3.3.2)                   
##  RColorBrewer   1.1-2   2014-12-07 [1] CRAN (R 3.3.0)                   
##  Rcpp           0.12.18 2018-07-23 [1] CRAN (R 3.3.2)                   
##  rlang          0.2.2   2018-08-16 [1] CRAN (R 3.3.2)                   
##  rmarkdown      1.10    2018-06-11 [1] CRAN (R 3.3.2)                   
##  roxygen2       6.1.0   2018-07-27 [1] CRAN (R 3.3.2)                   
##  rprojroot      1.3-2   2018-01-03 [1] CRAN (R 3.3.2)                   
##  scales         1.0.0   2018-08-09 [1] CRAN (R 3.3.2)                   
##  sessioninfo    1.1.0   2018-09-25 [1] CRAN (R 3.3.2)                   
##  stringi        1.2.4   2018-07-20 [1] CRAN (R 3.3.2)                   
##  stringr        1.3.1   2018-05-10 [1] CRAN (R 3.3.2)                   
##  tibble         1.4.2   2018-01-22 [1] CRAN (R 3.3.2)                   
##  tidyr          0.8.1   2018-05-18 [1] CRAN (R 3.3.2)                   
##  tidyselect     0.2.4   2018-02-26 [1] CRAN (R 3.3.2)                   
##  withr          2.1.2   2018-06-23 [1] Github (jimhester/withr@dbcd7cd) 
##  xml2           1.1.1   2017-01-24 [1] CRAN (R 3.3.2)                   
##  yaml           2.2.0   2018-07-25 [1] CRAN (R 3.3.2)                   
## 
## [1] /Library/Frameworks/R.framework/Versions/3.3/Resources/library