This vignette documents some functions and specificities that were not presented in the main vignette of the package. It is mainly targeted for advanced users of the GIFT database.
In all the functions from the package, there is a
version
argument. This argument allows for retrieving
different instances of the GIFT database and therefore make all previous
studies using the GIFT database reproducible. For example, the version
used in Weigelt et al. (2020) is "1.0"
. To get more
information about the content of the different versions, you can go here and click on the
tab Version Log.
To access all the available versions of the database, you can run the following function:
<- GIFT_versions()
versions kable(versions, "html") %>%
kable_styling(full_width = FALSE)
ID | version | description |
---|---|---|
1 | 1.0 | Data included in and workflows used to assemble GIFT 1.0 are described in detail in: Weigelt, P., König, C. & Kreft, H. (2020) GIFT – A Global Inventory of Floras and Traits for macroecology and biogeography. Journal of Biogeography, 47, 16-43. doi: 10.1111/jbi.13623 |
2 | 2.0 | New checklist and trait data included for Europe, the Mediterranean, temperate Asia, Panama, Japan, Java, New Zealand, Easter Island and the Torres Strait Islands. Updated workflows to document biases in the distribution of trait data; Updated taxonomic trait derivation; Final trait values and agreement scores for trait values from several resources are now calculated separately including and excluding restricted resources. |
3 | 2.1 | New checklists and traits included for the Americas, Crimea, Madagascar, Arabian peninsula, Laos, Bhutan, India, China, Sunda-Sahul shelf, Tonga, Canary Islands, West Africa and for ferns and palms globally. Large categorical trait data included from Try. |
4 | 2.2 | New checklists (with a focus on endemic species) and traits for various oceanic archipelagos (Cook Islands, Madeira, Arctic Islands, Cayman Islands, Comores, Juan Fernandez, Palau, Galapagos, Frisian Islands, Antilles, Japan, Mayotte, Fiji, Taiwan, etc.) and various mainland regions (Equatorial Guinea and the entire former USSR in sub-regions). |
The column version
of this table is the one to use when
you want to retrieve past versions of the GIFT database. By default, the
argument used is GIFT_version = "latest"
which leads to the
current latest stable version of the database (“2.0” in October
2022).
The function GIFT_lists()
can be run to retrieve
metadata about the GIFT checklists. In the next chunk, we call it with
different values for the GIFT_version
argument.
<- GIFT_lists(GIFT_version = "latest") # default value
list_latest <- GIFT_lists(GIFT_version = "1.0") list_1
The number of checklists available was 3122 in the version 1.0 and equals 4475 in the version 2.0.
When using GIFT database in a research article, it is a good practice
to cite the references used, and list them in an Appendix. The following
function retrieves the reference for each checklist, as well as some
metadata. References are documented in the column
ref_long
.
<- GIFT_references()
ref <- ref[which(ref$ref_ID %in% c(22, 10333, 10649)),
ref c("ref_ID", "ref_long", "geo_entity_ref")]
# 3 first rows of that table
kable(ref, "html") %>%
kable_styling(full_width = FALSE)
ref_ID | ref_long | geo_entity_ref | |
---|---|---|---|
4 | 10649 | Pavlov (1954-1966) Flora Kazakhstana. Nauka Kazakhskoy SSR, Alma-Ata, Kazakhstan. | Kazakhstan |
179 | 10333 | Zizka (1991) Flowering plants of Easter Island. Palmarum hortus francofurtensis 3, 3-108. | Easter Island |
772 | 22 | Kirchner, Picot, Merceron & Gigot (2010) Flore vasculaire de La Réunion. Conservatoire Botanique National de Mascarin, Réunion; France. | La Réunion |
The main wrapper function to retrieve checklists and their species
composition is GIFT_checklists()
but you also have the
possibility to retrieve individual checklists using
GIFT_checklists_raw()
. You would need to know the
identification number list_ID
of the checklists you want to
retrieve.
To quickly see all the list_ID
available in
the database, you can run GIFT_lists()
as shown in Section 1.
When calling GIFT_checklists_raw()
, you can set the
argument namesmatched
to TRUE
in order to get
extra columns informing about the taxonomic harmonization that was
performed when uploading the list to GIFT database.
<- GIFT_checklists_raw(list_ID = c(11926))
listID_1 <- GIFT_checklists_raw(list_ID = c(11926), namesmatched = TRUE)
listID_1_tax
ncol(listID_1) # 16 columns
ncol(listID_1_tax) # 33 columns
length(unique(listID_1$work_ID)); length(unique(listID_1_tax$orig_ID))
In the list we called, you can see that we “lost” some species after
taxonomic harmonization as we went from 1331 in the source to 1106 after
the taxonomic harmonization. This means that several species were
considered as synonyms or unknown plant species in the taxonomic
backbone used for harmonization.
Note: the service mainly
used to taxonomically harmonize the species’ names was The
Plant List up to version 2.0 and World checklist of Vascular Plants
afterwards.
In the main vignette, we illustrated how to retrieve checklists that were falling into a provided shapefile, using the western Mediterranean basin provided with the GIFT R package.
data("western_mediterranean")
We here provide more details on the different values the
overlap
argument can take, using the function
GIFT_spatial()
. The following figure illustrates how this
argument works:
Figure 1. GIFT spatial
We now illustrate this by retrieving checklists falling in the western Mediterranean basin using the four options available.
<- GIFT_spatial(shp = western_mediterranean,
med_centroid_inside overlap = "centroid_inside")
<- GIFT_spatial(shp = western_mediterranean,
med_extent_intersect overlap = "extent_intersect")
<- GIFT_spatial(shp = western_mediterranean,
med_shape_intersect overlap = "shape_intersect")
<- GIFT_spatial(shp = western_mediterranean,
med_shape_inside overlap = "shape_inside")
length(unique(med_extent_intersect$entity_ID))
length(unique(med_shape_intersect$entity_ID))
length(unique(med_centroid_inside$entity_ID))
length(unique(med_shape_inside$entity_ID))
We here see that we progressively lose lists as we apply more
selective criterion on the spatial overlap. The most restrictive option
being overlap = "shape_inside"
with 72 regions, then
overlap = "centroid_inside"
with 84 regions,
overlap = "shape_intersect"
with 104 regions and finally
the less restrictive one being overlap = "extent_intersect"
with 108 regions.
Using the functions GIFT_shapes()
and calling it for the entity_IDs retrieved in each instance, we can
download the shape files for each region.
<- GIFT_shapes(med_extent_intersect$entity_ID)
geodata_extent_intersect
<-
geodata_shape_inside which(geodata_extent_intersect$entity_ID %in%
geodata_extent_intersect[$entity_ID), ]
med_shape_inside<-
geodata_centroid_inside which(geodata_extent_intersect$entity_ID %in%
geodata_extent_intersect[$entity_ID), ]
med_centroid_inside<-
geodata_shape_intersect which(geodata_extent_intersect$entity_ID %in%
geodata_extent_intersect[$entity_ID), ] med_shape_intersect
And then make a map.
<- par(mfrow = c(2, 2), mai = c(0, 0, 0.5, 0))
par_overlap plot(sf::st_geometry(geodata_shape_inside),
col = geodata_shape_inside$entity_ID,
main = paste("shape inside\n",
length(unique(med_shape_inside$entity_ID)),
"polygons"))
plot(sf::st_geometry(western_mediterranean), lwd = 2, add = TRUE)
plot(sf::st_geometry(geodata_centroid_inside),
col = geodata_centroid_inside$entity_ID,
main = paste("centroid inside\n",
length(unique(med_centroid_inside$entity_ID)),
"polygons"))
points(geodata_centroid_inside$point_x, geodata_centroid_inside$point_y)
plot(sf::st_geometry(western_mediterranean), lwd = 2, add = TRUE)
plot(sf::st_geometry(geodata_shape_intersect),
col = geodata_shape_intersect$entity_ID,
main = paste("shape intersect\n",
length(unique(med_shape_intersect$entity_ID)),
"polygons"))
plot(sf::st_geometry(western_mediterranean), lwd = 2, add = TRUE)
plot(sf::st_geometry(geodata_extent_intersect),
col = geodata_extent_intersect$entity_ID,
main = paste("extent intersect\n",
length(unique(med_extent_intersect$entity_ID)),
"polygons"))
plot(sf::st_geometry(western_mediterranean), lwd = 2, add = TRUE)
par(par_overlap)
GIFT comprises many polygons and for some regions, there are several polygons overlapping. How to remove overlapping polygons and the associated parameters are two things detailed in the main vignette. We here provide further details:
length(med_shape_inside$entity_ID)
## [1] 72
length(GIFT_no_overlap(med_shape_inside$entity_ID, area_threshold_island = 0,
area_threshold_mainland = 100, overlap_threshold = 0.1))
## [1] 53
# The following polygons are overlapping:
GIFT_no_overlap(med_shape_inside$entity_ID, area_threshold_island = 0,
area_threshold_mainland = 100, overlap_threshold = 0.1)
## [1] 145 146 147 148 149 150 151 414 415 416 417 547
## [13] 548 549 550 551 552 586 591 592 736 738 739 10001
## [25] 10072 10104 10184 10303 10422 10430 10978 11029 11030 11031 11033 11035
## [37] 11038 11039 11042 11044 11045 11046 11434 11474 11477 11503 12231 12232
## [49] 12233 12632 12633 12634 12635
# Example of two overlapping polygons: Spain mainland and Andalusia
<- GIFT_shapes(entity_ID = c(10071, 12078)) overlap_shape
<- par(mfrow = c(1, 1))
par_overlap_shp plot(sf::st_geometry(overlap_shape),
col = c(rgb(red = 1, green = 0, blue = 0, alpha = 0.5),
rgb(red = 0, green = 0, blue = 1, alpha = 0.3)),
lwd = c(2, 1),
main = "Overlapping polygons")
par(par_overlap_shp)
GIFT_no_overlap(c(10071, 12078), area_threshold_island = 0,
area_threshold_mainland = 100, overlap_threshold = 0.1)
## [1] 12078
GIFT_no_overlap(c(10071, 12078), area_threshold_island = 0,
area_threshold_mainland = 100000, overlap_threshold = 0.1)
## [1] 10071
In GIFT_checklists()
, there is also the possibility to
remove overlapping polygons only if they belong to the same reference
(i.e. same ref_ID
).
We show how this works with the following example:
<- GIFT_checklists(taxon_name = "Tracheophyta", by_ref_ID = FALSE,
ex list_set_only = TRUE)
<- GIFT_checklists(taxon_name = "Tracheophyta",
ex2 remove_overlap = TRUE, by_ref_ID = TRUE,
list_set_only = TRUE)
<- GIFT_checklists(taxon_name = "Tracheophyta",
ex3 remove_overlap = TRUE, by_ref_ID = FALSE,
list_set_only = TRUE)
length(unique(ex$lists$ref_ID))
length(unique(ex2$lists$ref_ID))
length(unique(ex3$lists$ref_ID))
Asking for checklists of vascular plants, we get 367 checklists
without any overlapping criterion, 335 if we remove overlapping polygons
and 360 if we remove overlapping polygons at the reference level.
So what is the difference between the second and third case?
Let’s look at the checklists that are present in the second but not in
the third example.
unique(ex2$lists$ref_ID)[!(unique(ex2$lists$ref_ID) %in%
unique(ex3$lists$ref_ID))] # 25 references
25 references are in the second and not in the third example.If we
look at one of the references listed ref_ID = 10143
, we see
that it is a checklist for the Pilbara region in Australia. Its
entity_ID
is 10043. Looking at the GIFT website, we see
that other regions can overlap with it.
# Pilbara region Australy and overlapping shapes
<- GIFT_shapes(entity_ID = c(10043, 12172, 11398, 11391, 10918)) pilbara
ggplot(pilbara) +
geom_sf(aes(fill = as.factor(entity_ID)), alpha = 0.5) +
scale_fill_brewer("entity_ID", palette = "Set1")
Since these polygons do not belong to the same ref_ID
,
they are kept when by_ref_ID = TRUE
but are not considered
when by_ref_ID = FALSE
.
All the plant species present in GIFT database can be retrieved using
GIFT_species()
.
<- GIFT_species() species
To add additional information, like their order or family, we can
call GIFT_taxgroup()
.
# Add Family
$Family <- GIFT_taxgroup(
speciesas.numeric(species$work_ID), taxon_lvl = "family", return_ID = FALSE,
species = species)
Order or higher levels can also be retrieved.
GIFT_taxgroup(as.numeric(species$work_ID[1:5]), taxon_lvl = "order",
return_ID = FALSE)
GIFT_taxgroup(as.numeric(species$work_ID[1:5]),
taxon_lvl = "higher_lvl", return_ID = FALSE,
species = species)
As said above, plant species names can vary from the original sources
they come from to the final work_species
name they get, due
to the taxonomic harmonization procedure. Looking-up for a species and
the different steps of the taxonomic harmonization is possible with the
function GIFT_species_lookup()
.
<- GIFT_species_lookup(genus = "Fagus", epithet = "sylvatica",
Fagus namesmatched = TRUE)
In this table, we can see that the first entry Fagus silvatica was later converted as the accepted name Fagus sylvatica.
The taxonomy used in GIFT database can be downloaded using
GIFT_taxonomy()
.
<- GIFT_taxonomy() taxo
As other global databases of plant diversity exist and may rely on
different polygons, we provide a function GIFT_overlap()
than can look at the spatial overlap between GIFT polygons and polygons
coming from other databases.
So far, only two resources are
available: glonaf
and gmba
.
glonaf
stands for Global
Naturalized Alien Flora and gmba
for Global Mountain Biodiversity
Assessment.
GIFT_overlap()
returns the spatial overlap in percentage
for each pairwise combination of polygons between GIFT and the other
resource.
<- GIFT_overlap(resource = "glonaf")
glonaf
kable(glonaf[1:5, ], "html") %>%
kable_styling(full_width = FALSE)
entity_ID | glonaf_ID | overlap12 | overlap21 |
---|---|---|---|
10535 | 2 | 0.0000234 | 0.0003414 |
10535 | 9 | 0.0000063 | 0.0001627 |
10535 | 43 | 0.0000133 | 0.0031738 |
10535 | 86 | 0.9960670 | 0.9923370 |
10535 | 112 | 0.0000112 | 0.0002035 |
<- GIFT_overlap(resource = "gmba")
gmba
kable(gmba[1:5, ], "html") %>%
kable_styling(full_width = FALSE)
entity_ID | gmba_ID | overlap12 | overlap21 |
---|---|---|---|
10529 | 231 | 0.0000026 | 0.0031734 |
10529 | 234 | 0.0000274 | 0.0042164 |
10529 | 236 | 0.0002481 | 0.5110019 |
10529 | 237 | 0.0007120 | 1.0000002 |
10529 | 238 | 0.0006033 | 1.0000003 |