--- title: "GBIF scientific name matching" author: - Stijn Vanhoey - Damiano Oldoni date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{GBIF scientific name matching} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction Working with different partners/institutes/researchers results in a diversity of taxonomic names to define species. This hardens comparison amongst datasets, as in many occasions, aggregation is aimed for or filtering on specific species. By translating all species names to a common taxonomic backbone (ensuring unique ID's for each species name), this can be done. The `gbif_species_name_match` function supports matching with the GBIF taxonomic backbone. ## Aim This function provides the functionality to add the species information from the GBIF backbone to **any data table (`data.frame`)** by requesting this information via the GBIF API. For each match, the corresponding accepted name is looked for. Nevertheless there will always be errors and control is still required! ## Functionality The `gbif_species_name_match` function extends the matching function provided by [rgbif](https://github.com/ropensci/rgbif) to be compatible with a `data.frame` data structure. Loading the functionality can be done by loading the `inborutils` package: ```{r} library(inborutils) ``` Consider the example data set `species_example`: ```{r} knitr::kable(species_example) ``` To add the species information, using the `scientificName` column, and the default fields: ```{r warning=FALSE} my_data_update <- gbif_species_name_match(species_example, name = "speciesName") ``` ```{r} knitr::kable(my_data_update) ``` When not satisfied by the default fields provided `('usageKey','scientificName','rank','order','matchType','phylum', 'kingdom','genus', 'class','confidence', 'synonym', 'status','family')`, you can alter these by the `gbif_terms` argument, for example: ```{r message=FALSE, warning=FALSE} gbif_terms_to_use <- c("canonicalName", "order") my_data_update <- gbif_species_name_match(species_example, name = "speciesName", gbif_terms = gbif_terms_to_use) ``` ```{r} knitr::kable(my_data_update) ``` If the name of a GBIF field is already in use as column name in your data.frame, the suffix number `1` is added and a warning is returned. For example: ```{r message=FALSE, warning=TRUE} df <- species_example names(df) <- c("scientificName", names(species_example)[2:3]) gbif_terms_to_use <- c("scientificName", "order") my_data_update <- gbif_species_name_match(df, name = "scientificName", gbif_terms = gbif_terms_to_use) ``` ```{r} knitr::kable(my_data_update) ``` Sometimes, a scientific name can occur in different kingdoms, the so-called *hemihomonyms*. To avoid a taxon being misidentified, it is then sometimes useful to specify kingdom it belongs to. You could also add other taxonomic related parameters such as rank, family or genus. It is also possible to pass other not taxonomic related parameters, e.g. `strict` which allows more control on the match behaviour. For more information about all parameters accepted by GBIF, see documentation on [GBIF match](https://www.gbif.org/developer/species#searching). ```{r message=FALSE, warning=FALSE} my_data_update <- gbif_species_name_match(species_example, name = "speciesName", kingdom = "kingdom", strict = TRUE) ``` ```{r} knitr::kable(my_data_update) ```