Case study - resolving Species Plantarum names

Tim (@fozy81)


Species Plantarum by Carl Linnaeus originally published in 1753 and was the first work to consistently describe species using binominal names. It provides a reference point to chart how the usage of 5940 taxa (a subset of 500 used below) described within have changed over time. So let’s use the taxize package see if we can resolve these names and see their current status.

Firstly, let’s see how many of the names still exist in modern species catologues today. We can use the resolve function to query the Global Name Resolver.

# load a subset of Species Plantarum names from the taxize package
names <- species_plantarum_binomials[1:500,]
# Species Plantarum names are split into Genus and Species qualifier so need to be paste them
# together to allow the full binominal names to be sent to the global name resolver
names$species <- paste(names$genus, names$epithet)
# use resolve function to send binominal names to resolver (this may take some time)
resolved_names <- resolve(names$species)
# select only the dataframe column from the list send back from gnr
resolved_names <- resolved_names$gnr
# count the distinct names resolved per dataset source
resolved_names_source <- resolved_names[, 
  c("submitted_name", "data_source_title")
resolved_names_source <- unique(resolved_names_source)
resolved_names_source$count <- 1
summary_resolved <- aggregate(resolved_names_source$count,
    list(data_source = resolved_names_source$data_source_title), FUN = "sum")
summary_resolved <- summary_resolved[with(summary_resolved, order(-x)), ]
kable(head(summary_resolved, 20))
data_source x
40 uBio NameBank 499
19 GBIF Backbone Taxonomy 491
5 Catalogue of Life 481
1 Arctos 465
36 The International Plant Names Index 460
39 Tropicos - Missouri Botanical Garden 422
21 GRIN Taxonomy for Plants 319
4 288
11 Encyclopedia of Life 259
33 Open Tree of Life Reference Taxonomy 249
35 The Interim Register of Marine and Nonmarine Genera 245
27 ITIS 239
12 EUNIS 222
30 NCBI 218
31 nlbif 184
42 USDA NRCS PLANTS Database 176
23 iNaturalist 141
16 Freebase 135
41 Union 4 133
10 Database of Vascular Plants of Canada (VASCAN) 123

The table above shows none of the data sources provided can resolve all of the names in the Species Plantarum. Let’s see if between all the data sources we can resolve all the names.

gnr_names <- resolved_names[, c("submitted_name", "score")] 
gnr_names <- unique(gnr_names)

merge_gnr_plantarum <- merge(names, gnr_names,
  by.x = "species", by.y = "submitted_name", all.x = TRUE)

merge_gnr_plantarum <- aggregate(merge_gnr_plantarum$score,
  list(data_source = merge_gnr_plantarum$species), FUN = "mean")

#> [1] 500
min(merge_gnr_plantarum$x, na.rm = TRUE)
#> [1] 0.75
max(merge_gnr_plantarum$x, na.rm = TRUE)
#> [1] 0.9915

Even though no individual name data source can resolve all the names, taken together all names in Species Plantarum can be resolved using modern sources. Noting that some names score poorly in terms of fuzzy matching. The poor matching could indicate imperfect matches and perhaps some matches are due to species identified after 1752 sharing similar names. There are lots of questions regarding the history of many of these original binominals and their journey into present day taxonomy.

For example, let’s see how many of these names are still the ‘accepted’ names according to ITIS. Not all the data providers give an accepted name. So lets use ITIS as the get_tsn_ function returns a nameUsage variable which indicates if a name is still accepted or not.

# find resolved species Plantarum names returned from ITIS
itis_resolved <- resolved_names$submitted_name[resolved_names$data_source_title == "ITIS"]
# select random sub-selection to save time
itis_sample <- itis_resolved[sample(rep(1:length(unique(itis_resolved))), 200)]
# query ITIS (this may take some time)
its_names <- get_tsn_(itis_sample)
its_names_bind <-"rbind", its_names)
its_names_bind$n <- 1
# count the number of names in each nameUsage category 
its_names_agg <- aggregate(its_names_bind$n,
  list(data_source = its_names_bind$nameUsage), FUN = "sum")

data_source x
accepted 195
valid 3

Of all the Species Plantarum names found in the ITIS data source, a high number (of 200 randomly selected names) are still the ‘accepted’ name. This indicates Carl Linnaeus, as widely acknowledged, identified well-thought-out and pragmatic taxonomic concepts which have proved useful until the present day.

Further research may discover why some names appear not to be used today in some data sources and where they are used, why some are no longer the accepted name.