taxize allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
The taxize tutorial is can be found at https://ropensci.org/tutorials/taxize.html
The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes. For example, gnr_resolve uses the Global Names Resolver API to resolve species names. General functions in the package that don’t hit a specific API don’t have two words separated by an underscore, e.g., classification.
You need API keys for Encyclopedia of Life (EOL), and Tropicos.
Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: Pan-European Species directories Infrastructure and Mycobank. Data sources that use SOAP web services have been moved to taxizesoap at https://github.com/ropensci/taxizesoap.
taxize| Souce | Function prefix | API Docs | API key |
|---|---|---|---|
| Encylopedia of Life |
eol
|
link | link |
| Taxonomic Name Resolution Service |
tnrs
|
“api.phylotastic.org/tnrs” | none |
| Integrated Taxonomic Information Service |
itis
|
link | none |
| Global Names Resolver |
gnr
|
link | none |
| Global Names Index |
gni
|
link | none |
| IUCN Red List |
iucn
|
link | none |
| Tropicos |
tp
|
link | link |
| Theplantlist dot org |
tpl
|
** | none |
| Catalogue of Life |
col
|
link | none |
| National Center for Biotechnology Information |
ncbi
|
none | none |
| CANADENSYS Vascan name search API |
vascan
|
link | none |
| International Plant Names Index (IPNI) |
ipni
|
link | none |
| Barcode of Life Data Systems (BOLD) |
bold
|
link | none |
| National Biodiversity Network (UK) |
nbn
|
link | none |
| Index Fungorum |
fg
|
link | none |
| EU BON |
eubon
|
link | none |
| Index of Names (ION) |
ion
|
link | none |
| Open Tree of Life (TOL) |
tol
|
link | none |
| World Register of Marine Species (WoRMS) |
worms
|
link | none |
| NatureServe |
natserv
|
link | link |
**: There are none! We suggest using TPL and TPLck functions in the taxonstand package. We provide two functions to get bullk data: tpl_families and tpl_get.
***: There are none! The function scrapes the web directly.
See the newdatasource tag in the issue tracker
For more examples see the tutorial
install.packages("taxize")Windows users install Rtools first.
install.packages("devtools")
devtools::install_github("ropensci/taxize")library('taxize')Alot of taxize revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it’s better to get an identifier that a particular data sources knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)
lapply(out, head)
#> $`315576`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
#>
#> $`492549`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')
#> $Salmo
#> childtaxa_id childtaxa_name childtaxa_rank
#> 1 1509524 Salmo marmoratus x Salmo trutta species
#> 2 1484545 Salmo cf. cenerinus BOLD:AAB3872 species
#> 3 1483130 Salmo zrmanjaensis species
#> 4 1483129 Salmo visovacensis species
#> 5 1483128 Salmo rhodanensis species
#> 6 1483127 Salmo pellegrini species
#> 7 1483126 Salmo opimus species
#> 8 1483125 Salmo macedonicus species
#> 9 1483124 Salmo lourosensis species
#> 10 1483123 Salmo labecula species
#> 11 1483122 Salmo farioides species
#> 12 1483121 Salmo chilo species
#> 13 1483120 Salmo cettii species
#> 14 1483119 Salmo cenerinus species
#> 15 1483118 Salmo aphelios species
#> 16 1483117 Salmo akairos species
#> 17 1201173 Salmo peristericus species
#> 18 1035833 Salmo ischchan species
#> 19 700588 Salmo labrax species
#> 20 237411 Salmo obtusirostris species
#> 21 235141 Salmo platycephalus species
#> 22 234793 Salmo letnica species
#> 23 62065 Salmo ohridanus species
#> 24 33518 Salmo marmoratus species
#> 25 33516 Salmo fibreni species
#> 26 33515 Salmo carpio species
#> 27 8032 Salmo trutta species
#> 28 8030 Salmo salar species
#>
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"Get all species in the genus Apis
downstream("Apis", db = 'itis', downto = 'Species', verbose = FALSE)
#> $Apis
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 154396 Apis 154395 Apis mellifera 220 species
#> 2 763550 Apis 154395 Apis andreniformis 220 species
#> 3 763551 Apis 154395 Apis cerana 220 species
#> 4 763552 Apis 154395 Apis dorsata 220 species
#> 5 763553 Apis 154395 Apis florea 220 species
#> 6 763554 Apis 154395 Apis koschevnikovi 220 species
#> 7 763555 Apis 154395 Apis nigrocincta 220 species
#>
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)
#> $`Pinus contorta`
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 18031 Pinaceae 18030 Abies 180 genus
#> 2 18033 Pinaceae 18030 Picea 180 genus
#> 3 18035 Pinaceae 18030 Pinus 180 genus
#> 4 183396 Pinaceae 18030 Tsuga 180 genus
#> 5 183405 Pinaceae 18030 Cedrus 180 genus
#> 6 183409 Pinaceae 18030 Larix 180 genus
#> 7 183418 Pinaceae 18030 Pseudotsuga 180 genus
#> 8 822529 Pinaceae 18030 Keteleeria 180 genus
#> 9 822530 Pinaceae 18030 Pseudolarix 180 genus
#>
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"synonyms("Acer drummondii", db="itis")
#> $`Acer drummondii`
#> sub_tsn acc_name acc_tsn
#> 1 183671 Acer rubrum var. drummondii 526853
#> 2 183671 Acer rubrum var. drummondii 526853
#> 3 183671 Acer rubrum var. drummondii 526853
#> acc_author syn_author
#> 1 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) E. Murray
#> 2 (Hook. & Arn. ex Nutt.) Sarg. Hook. & Arn. ex Nutt.
#> 3 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) Small
#> syn_name syn_tsn
#> 1 Acer rubrum ssp. drummondii 28730
#> 2 Acer drummondii 183671
#> 3 Rufacer drummondii 183672
#>
#> attr(,"class")
#> [1] "synonyms"
#> attr(,"db")
#> [1] "itis"get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)
#> $itis
#> Salvelinus fontinalis
#> "162003"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
#> attr(,"class")
#> [1] "tsn"
#>
#> $ncbi
#> Salvelinus fontinalis
#> "8038"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/8038"
#>
#> attr(,"class")
#> [1] "ids"You can limit to certain rows when getting ids in any get_*() functions
get_ids(names="Poa annua", db = "gbif", rows=1)
#> $gbif
#> Poa annua
#> "2704179"
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] TRUE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.gbif.org/species/2704179"
#>
#> attr(,"class")
#> [1] "ids"Furthermore, you can just back all ids if that’s your jam with the get_*_() functions (all get_*() functions with additional _ underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> $nbn
#> $nbn$`Chironomus riparius`
#> guid scientificName rank taxonomicStatus
#> 1 NBNSYS0000027573 Chironomus riparius species accepted
#> 2 NHMSYS0001718585 Cryptohypnus riparius species synonym
#> 3 NHMSYS0000864966 Damaeus (Damaeus) riparius species accepted
#>
#> $nbn$`Pinus contorta`
#> guid scientificName rank
#> 1 NBNSYS0000004786 Pinus contorta species
#> 2 NHMSYS0000494848 Pinus contorta subsp. contorta subspecies
#> 3 NHMSYS0000494858 Pinus contorta subsp. murreyana subspecies
#> taxonomicStatus
#> 1 accepted
#> 2 synonym
#> 3 synonym
#>
#>
#> attr(,"class")
#> [1] "ids"sci2comm('Helianthus annuus', db = 'itis')
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower" "wild sunflower"
#> [4] "annual sunflower"comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Ursus thibetanus" "Ursus thibetanus"
#> [3] "Ursus americanus luteolus" "Ursus americanus americanus"
#> [5] "Ursus americanus" "Ursus americanus"
#> [7] "Chiropotes satanas"spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
lowest_common(spp, db = "ncbi")
#> name rank id
#> 21 Boreoeutheria below-class 1437010numeric to uid
as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"list to uid
as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339" "9696"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"multiple_matches")
#> [1] FALSE FALSE FALSE
#> attr(,"pattern_match")
#> [1] FALSE FALSE FALSE
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "http://www.ncbi.nlm.nih.gov/taxonomy/3339"
#> [3] "http://www.ncbi.nlm.nih.gov/taxonomy/9696"out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#> ids class match multiple_matches pattern_match
#> 1 315567 uid found FALSE FALSE
#> 2 3339 uid found FALSE FALSE
#> 3 9696 uid found FALSE FALSE
#> uri
#> 1 http://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2 http://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3 http://www.ncbi.nlm.nih.gov/taxonomy/9696Alphebetical
Check out our milestones to see what we plan to get done for each version.
taxize in R doing citation(package = 'taxize')