The Datasaurus data package

Steph Locke


This package wraps the awesome Datasaurus Dozen dataset.

The Datasaurus was created by Alberto Cairo in this great blog post.

Datasaurus shows us why visualisation is important, not just summary statistics.

He’s been subsequently made even more famous in the paper Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing by Justin Matejka and George Fitzmaurice.

In the paper, Justin and George simulate a variety of datasets that the same summary statistics to the Datasaurus but have very different distributions.

This package looks to make these datasets available for use as an advanced Anscombe’s Quartet, available in R as anscombe.


You can use the package to produce Anscombe-style plots and more.

ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset))+
  theme(legend.position = "none")+
  facet_wrap(~dataset, ncol=3)
## Loading required package: ggplot2