GBIF scientific name matching

Introduction

Working with different partners/institutes/researchers results in a diversity of taxonomic names to define species. This hardens comparison amongst datasets, as in many occasions, aggregation is aimed for or filtering on specific species. By translating all species names to a common taxonomic backbone (ensuring unique ID’s for each species name), this can be done. The gbif_species_name_match function supports matching with the GBIF taxonomic backbone.

Aim

This function provides the functionality to add the species information from the GBIF backbone to any data table (data.frame) by requesting this information via the GBIF API. For each match, the corresponding accepted name is looked for. Nevertheless there will always be errors and control is still required!

Functionality

The gbif_species_name_match function extends the matching function provided by rgbif to be compatible with a data.frame data structure.

Loading the functionality can be done by loading the inborutils package:

library(inborutils)

Consider the example data set species_example:

knitr::kable(species_example)
speciesName kingdom euConcernStatus
Alopochen aegyptiaca Animalia under consideration
Cotoneaster ganghobaensis Plantae NA
Cotoneaster hylmoei Plantae NA

To add the species information, using the scientificName column, and the default fields:

my_data_update <- gbif_species_name_match(species_example,
                                          name = "speciesName")
## [1] "All column names present"
## New names:
## • `kingdom` -> `kingdom...2`
## • `kingdom` -> `kingdom...10`
knitr::kable(my_data_update)
speciesName kingdom…2 euConcernStatus usageKey scientificName rank order matchType phylum kingdom…10 genus class confidence synonym status family
Alopochen aegyptiaca Animalia under consideration 2498252 Alopochen aegyptiaca (Linnaeus, 1766) SPECIES Anseriformes EXACT Chordata Animalia Alopochen Aves 99 FALSE ACCEPTED Anatidae
Cotoneaster ganghobaensis Plantae NA 3025989 Cotoneaster ganghobaensis J.Fryer & B.Hylmö SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 99 FALSE ACCEPTED Rosaceae
Cotoneaster hylmoei Plantae NA 3025758 Cotoneaster hylmoei Flinck & J.Fryer SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 98 TRUE SYNONYM Rosaceae

When not satisfied by the default fields provided ('usageKey','scientificName','rank','order','matchType','phylum', 'kingdom','genus', 'class','confidence', 'synonym', 'status','family'), you can alter these by the gbif_terms argument, for example:

gbif_terms_to_use <- c("canonicalName", "order")
my_data_update <- gbif_species_name_match(species_example,
                                              name = "speciesName",
                                              gbif_terms = gbif_terms_to_use)
## [1] "All column names present"
knitr::kable(my_data_update)
speciesName kingdom euConcernStatus canonicalName order
Alopochen aegyptiaca Animalia under consideration Alopochen aegyptiaca Anseriformes
Cotoneaster ganghobaensis Plantae NA Cotoneaster ganghobaensis Rosales
Cotoneaster hylmoei Plantae NA Cotoneaster hylmoei Rosales

If the name of a GBIF field is already in use as column name in your data.frame, the suffix number 1 is added and a warning is returned. For example:

df <- species_example
names(df) <- c("scientificName", names(species_example)[2:3])
gbif_terms_to_use <- c("scientificName", "order")
my_data_update <- gbif_species_name_match(df,
                                          name = "scientificName",
                                          gbif_terms = gbif_terms_to_use)
## [1] "All column names present"
## Warning in gbif_species_name_match(df, name = "scientificName", gbif_terms =
## gbif_terms_to_use): Column with names 'scientificName' is also one of the
## returned gbif_terms. GBIF column name is authomatically recalled
## 'scientificName1'.
knitr::kable(my_data_update)
scientificName…1 kingdom euConcernStatus scientificName…4 order
Alopochen aegyptiaca Animalia under consideration Alopochen aegyptiaca (Linnaeus, 1766) Anseriformes
Cotoneaster ganghobaensis Plantae NA Cotoneaster ganghobaensis J.Fryer & B.Hylmö Rosales
Cotoneaster hylmoei Plantae NA Cotoneaster hylmoei Flinck & J.Fryer Rosales

Sometimes, a scientific name can occur in different kingdoms, the so-called hemihomonyms. To avoid a taxon being misidentified, it is then sometimes useful to specify kingdom it belongs to. You could also add other taxonomic related parameters such as rank, family or genus. It is also possible to pass other not taxonomic related parameters, e.g. strict which allows more control on the match behaviour. For more information about all parameters accepted by GBIF, see documentation on GBIF match.

my_data_update <- gbif_species_name_match(species_example,
                                          name = "speciesName",
                                          kingdom = "kingdom",
                                          strict = TRUE)
## [1] "All column names present"
knitr::kable(my_data_update)
speciesName kingdom…2 euConcernStatus usageKey scientificName rank order matchType phylum kingdom…10 genus class confidence synonym status family
Alopochen aegyptiaca Animalia under consideration 2498252 Alopochen aegyptiaca (Linnaeus, 1766) SPECIES Anseriformes EXACT Chordata Animalia Alopochen Aves 100 FALSE ACCEPTED Anatidae
Cotoneaster ganghobaensis Plantae NA 3025989 Cotoneaster ganghobaensis J.Fryer & B.Hylmö SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 100 FALSE ACCEPTED Rosaceae
Cotoneaster hylmoei Plantae NA 3025758 Cotoneaster hylmoei Flinck & J.Fryer SPECIES Rosales EXACT Tracheophyta Plantae Cotoneaster Magnoliopsida 100 TRUE SYNONYM Rosaceae