Robust and Beautiful Statistical Visualization

How the estimation plot combines statistical rigour and visual design

Joses Ho and Adam Claridge-Chang


Statistical Visualization

What is data visualization? Battle-Baptiste and Rusert (2018) 1 W. E. B. Du Bois’s Data Portraits: Visualizing Black America. Edited by Whitney Battle-Baptiste and Britt Rusert, Princeton Architectural Press, 2018 give a cogent and compelling definition:

[Data visualization is] the rendering of information in a visual format to help communicate data while also generating new patterns and knowledge through the act of visualization itself.

Sadly, too many figures and visualizations in modern academic publications seemingly fail to “generate new patterns and knowledge through the act of visualization itself”. Here, we propose a solution: the estimation plot.

The Inadequacy of Common Plots

The Barplot

Let’s say we have performed an experiment with 20 control subjects, and 20 test subjects. We begin our data analysis by making a barplot of the data.

The barplot has several shortcomings, despite enjoying widespread usage in academic journals. We’re not the first ones (see this, this, or that) to point out the myriad flaws with the barplot. Importantly, the barplot does not show us the effect size.

Alternatively, we can use a boxplot to visualize the data.

The Boxplot

Unfortunately, the boxplot still doesn’t show all our data. We still lack information about the underlying distribution of your data. Is it normally distributed? Is there skew in the points? What is the sample size? More importantly, boxplots do not display the effect size.

To display several data points across one or more categories, we can use the jitter plot.

The Jitter Plot

Jitter plots avoid overlapping datapoints (i.e. datapoints with the same y-value) by adding a random factor to each point along the orthogonal x-axes. Thus, while a jitter plot displays all datapoints (implicitly indicating the sample size visually), it might not accurately depict the underlying distribution of the data.

Introducing the Estimation Plot