Title: | Prepare Movebank Data for Publication |
---|---|
Description: | Prepare animal tracking data from 'Movebank' (<https://movebank.org>) for publication in a research repository. With 'movepub' you can document data with metadata following the Data Package standard and transform these to Darwin Core and Ecological Metadata Language ('EML') for publication to the Global Biodiversity Information Facility ('GBIF') and the Ocean Biodiversity Information System ('OBIS'). |
Authors: | Peter Desmet [aut, cre] (<https://orcid.org/0000-0002-8442-8025>, Research Institute for Nature and Forest (INBO)), Sanne Govaert [ctb] (<https://orcid.org/0000-0002-8939-1305>, Research Institute for Nature and Forest (INBO)), Sarah Davidson [ctb] (<https://orcid.org/0000-0002-2766-9201>, Max Planck Institute of Animal Behavior), Research Institute for Nature and Forest (INBO) [cph] (https://www.vlaanderen.be/inbo/en-gb/), NLBIF [fnd] (https://www.nlbif.nl/move2gbif-gps-zendergegevens-van-dieren-mobiliseren-naar-movebank-en-gbif/), European Union [fnd] (https://dto-bioflow.eu/) |
Maintainer: | Peter Desmet <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2024-12-19 05:54:32 UTC |
Source: | https://github.com/inbo/movepub |
Adds Movebank data (reference-data
, gps
, acceleration
,
accessory-measurements
) as a Data Resource to a Frictionless Data Package.
The function extends frictionless::add_resource()
.
The title, definition, format and URI of each field are looked up in the
latest version of the Movebank Attribute Dictionary and included in the Table
Schema of the resource.
add_resource(package, resource_name, files, keys = TRUE)
add_resource(package, resource_name, files, keys = TRUE)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
files |
One or more paths to CSV file(s) that contain the data for this resource, as a character (vector). |
keys |
If |
See Get started for examples.
Provided package
with one additional resource.
Get metadata from DataCite and transform to EML.
datacite_to_eml(doi)
datacite_to_eml(doi)
doi |
DOI of a dataset. |
EML list that can be extended and/or written to file with
EML::write_eml()
.
Other support functions:
get_aphia_id()
,
get_mvb_term()
This function wraps worrms::wm_name2id_()
so that it returns a data frame
rather than a list.
It also silences "not found" warnings, returning NA
instead.
get_aphia_id(x)
get_aphia_id(x)
x |
A (vector with) taxonomic name(s). |
Data frame with name
, aphia_id
, aphia_lsid
and aphia_url
.
Other support functions:
datacite_to_eml()
,
get_mvb_term()
get_aphia_id("Mola mola") get_aphia_id(c("Mola mola", "not_a_name"))
get_aphia_id("Mola mola") get_aphia_id(c("Mola mola", "not_a_name"))
Search a term by its label in the Movebank Attribute Dictionary (MVB).
Returns in order: term with matching prefLabel
, matching altLabel
or
error when no matching term is found.
get_mvb_term(label)
get_mvb_term(label)
label |
Label of the term to look for. Case will be ignored and |
List with term information.
Other support functions:
datacite_to_eml()
,
get_aphia_id()
get_mvb_term("animal_id") get_mvb_term("Deploy.On.Date")
get_mvb_term("animal_id") get_mvb_term("Deploy.On.Date")
A sample Movebank dataset with GPS tracking data, formatted as a
Frictionless Data Package and read by
read_package()
.
o_assen
o_assen
An object of class datapackage
(inherits from list
) of length 7.
This sample is derived from the Zenodo-deposited dataset Dijkstra et al. (2022), but excludes the acceleration data.
https://doi.org/10.5281/zenodo.10053903
## Not run: # The data in o_assen was created with the code below o_assen <- read_package("https://zenodo.org/records/10053903/files/datapackage.json") %>% remove_resource("acceleration") o_assen$title <- "O_ASSEN - Eurasian oystercatchers (Haematopus ostralegus, Haematopodidae) breeding in Assen (the Netherlands)" o_assen$licenses[[1]]$name <- "CC0-1.0" o_assen$contributors[[1]]$title <- "Vogelwerkgroep Assen" o_assen$contributors[[1]]$role <- "rightsHolder" usethis::use_data(o_assen, overwrite = TRUE) ## End(Not run)
## Not run: # The data in o_assen was created with the code below o_assen <- read_package("https://zenodo.org/records/10053903/files/datapackage.json") %>% remove_resource("acceleration") o_assen$title <- "O_ASSEN - Eurasian oystercatchers (Haematopus ostralegus, Haematopodidae) breeding in Assen (the Netherlands)" o_assen$licenses[[1]]$name <- "CC0-1.0" o_assen$contributors[[1]]$title <- "Vogelwerkgroep Assen" o_assen$contributors[[1]]$role <- "rightsHolder" usethis::use_data(o_assen, overwrite = TRUE) ## End(Not run)
Transforms a Movebank dataset (formatted as a Frictionless Data Package) to a Darwin Core Archive.
write_dwc( package, directory, dataset_id = package$id, dataset_name = package$title, license = NULL, rights_holder = NULL )
write_dwc( package, directory, dataset_id = package$id, dataset_name = package$title, license = NULL, rights_holder = NULL )
package |
A Frictionless Data Package of Movebank data, as returned by
|
directory |
Path to local directory to write files to. |
dataset_id |
Identifier for the dataset. |
dataset_name |
Title of the dataset. |
license |
License of the dataset. |
rights_holder |
Acronym of the organization owning or managing the rights over the data. |
The resulting files can be uploaded to an IPT
for publication to GBIF and/or OBIS.
A corresponding eml.xml
metadata file can be created with write_eml()
.
See vignette("movepub")
for an example.
CSV and meta.xml
files written to disk.
And invisibly, a list of data frames with the transformed data.
This function follows recommendations suggested by Peter Desmet, Sarah Davidson, John Wieczorek and others and transforms data to:
An Occurrence core.
A meta.xml
file.
Key features of the Darwin Core transformation:
Deployments (animal+tag associations) are parent events, with tag
attachment (a human observation) and GPS positions (machine observations)
as child events.
No information about the parent event is provided other than its ID,
meaning that data can be expressed in an Occurrence core with one row per
observation and parentEventID
shared by all occurrences in a deployment.
The tag attachment event often contains metadata about the animal (sex, life stage, comments) and deployment as a whole. The sex and life stage are additionally provided in an Extended Measurement Or Facts extension, where values are mapped to a controlled vocabulary recommended by OBIS.
No event/occurrence is created for the deployment end, since the end date is often undefined, unreliable and/or does not represent an animal occurrence.
Only visible
(non-outlier) GPS records that fall within a deployment are
included.
GPS positions are downsampled to the first GPS position per hour, to reduce the size of high-frequency data. It is possible for a deployment to contain no GPS positions, e.g. if the tag malfunctioned right after deployment.
Parameters or metadata are used to set the following record-level terms:
dwc:datasetID
: dataset_id
, defaulting to package$id
.
dwc:datasetName
: dataset_name
, defaulting to package$title
.
dcterms:license
: license
, defaulting to the first license name
(e.g. CC0-1.0
) in package$licenses
.
dcterms:rightsHolder
: rights_holder
, defaulting to the first
contributor in package$contributors
with role rightsHolder
.
Other dwc functions:
write_eml()
write_dwc(o_assen, directory = "my_directory") # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)
write_dwc(o_assen, directory = "my_directory") # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)
Transforms the metadata of a published Movebank dataset (with a DOI) to an Ecological Metadata Language (EML) file.
write_eml( doi, directory, contact = NULL, study_id = NULL, derived_paragraph = TRUE )
write_eml( doi, directory, contact = NULL, study_id = NULL, derived_paragraph = TRUE )
doi |
DOI of the original dataset, used to get metadata. |
directory |
Path to local directory to write files to. |
contact |
Person to be set as resource contact and metadata provider.
To be provided as a |
study_id |
Identifier of the Movebank study from which the dataset was
derived (e.g. |
derived_paragraph |
If |
The resulting EML file can be uploaded to an IPT
for publication to GBIF and/or OBIS.
A corresponding Darwin Core Archive can be created with write_dwc()
.
See vignette("movepub")
for an example.
eml.xml
file written to disk.
And invisibly, an EML::eml object.
Metadata are derived from the original dataset by looking up its doi
in
DataCite (example)
and transforming these to EML.
The following properties are set:
title: Original dataset title.
description: Original dataset description.
If derived_paragraph = TRUE
a generated paragraph is added, e.g.:
Data have been standardized to Darwin Core using the movepub R package and are downsampled to the first GPS position per hour. The original data are available in Dijkstra et al. (2023, https://doi.org/10.5281/zenodo.10053903), a deposit of Movebank study 1605797471.
license: License of the original dataset.
creators: Creators of the original dataset.
contact: contact
or first creator of the original dataset.
metadata provider: contact
or first creator of the original dataset.
keywords: Keywords of the original dataset.
alternative identifier: DOI of the original dataset. As a result, no new DOI will be created when publishing to GBIF.
external link and alternative identifier: URL created from
study_id
or the first derived from
related identifier in the original
dataset.
The following properties are not set:
type
subtype
update frequency
publishing organization
geographic coverage
taxonomic coverage
temporal coverage
associated parties
project data
sampling methods
citations
collection data: not applicable.
Other dwc functions:
write_dwc()
(write_eml(doi = "10.5281/zenodo.10053903", directory = "my_directory")) # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)
(write_eml(doi = "10.5281/zenodo.10053903", directory = "my_directory")) # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)