| Title: | Prepare Movebank Data for Publication |
|---|---|
| Description: | Prepare animal tracking data from 'Movebank' (<https://www.movebank.org/>) for publication in a research repository. With 'movepub' you can document data with metadata following the Data Package standard and transform these to Darwin Core and Ecological Metadata Language ('EML') for publication to the Global Biodiversity Information Facility ('GBIF') and the Ocean Biodiversity Information System ('OBIS'). |
| Authors: | Peter Desmet [aut, cre] (ORCID: <https://orcid.org/0000-0002-8442-8025>, affiliation: Research Institute for Nature and Forest (INBO)), Sanne Govaert [aut] (ORCID: <https://orcid.org/0000-0002-8939-1305>, affiliation: Research Institute for Nature and Forest (INBO)), Sarah Davidson [ctb] (ORCID: <https://orcid.org/0000-0002-2766-9201>, affiliation: Max Planck Institute of Animal Behavior), Research Institute for Nature and Forest (INBO) [cph] (ROR: <https://ror.org/00j54wy13>), NLBIF [fnd] (https://www.nlbif.nl/move2gbif-gps-zendergegevens-van-dieren-mobiliseren-naar-movebank-en-gbif/), European Union [fnd] (https://doi.org/10.3030/101112823) |
| Maintainer: | Peter Desmet <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.4.0.9000 |
| Built: | 2026-05-28 10:59:43 UTC |
| Source: | https://github.com/inbo/movepub |
Adds Movebank data (reference-data, gps, acceleration,
accessory-measurements) as a Data Resource to a Data Package.
The function extends frictionless::add_resource() by adding the following
to the Table Schema of the resource:
The title, definition, format and URI for each field, from the latest version of the Movebank Attribute Dictionary.
The primary key of the resource and foreign keys between resources.
add_resource(package, resource_name, files, keys = TRUE)add_resource(package, resource_name, files, keys = TRUE)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
files |
One or more paths to CSV file(s) that contain the data for this resource, as a character (vector). |
keys |
If |
See vignette("movepub") for examples.
package with one additional resource.
Get metadata from DataCite and transform to EML.
datacite_to_eml(doi)datacite_to_eml(doi)
doi |
DOI of a dataset. |
EML list that can be extended and/or written to file with
EML::write_eml().
Other support functions:
get_aphia_id(),
html_to_docbook()
datacite_to_eml("10.5281/zenodo.10053903")datacite_to_eml("10.5281/zenodo.10053903")
This function wraps worrms::wm_name2id_() so that it returns a data frame
rather than a list.
It also silences "not found" warnings, returning NA instead.
get_aphia_id(x)get_aphia_id(x)
x |
A (vector with) taxonomic name(s). |
Data frame with name, aphia_id, aphia_lsid and aphia_url.
Other support functions:
datacite_to_eml(),
html_to_docbook()
get_aphia_id("Mola mola") get_aphia_id(c("Mola mola", "not_a_name"))get_aphia_id("Mola mola") get_aphia_id(c("Mola mola", "not_a_name"))
Converts text with HTML syntax to DocBook, splitting paragraphs and headers into separate elements. Only a subset of HTML tags are supported (see transformation details), all other HTML syntax is removed.
html_to_docbook(string)html_to_docbook(string)
string |
Character (vector) that may contain HTML syntax. |
A character vector with HTML converted to DocBook.
The function splits text into a character vector, with one element for each
paragraph, header or line break (\n).
The remaining HTML is converted to DocBook, but only tags those supported by
EML for paragraphs.
All other HTML/DocBook syntax is sanitized and empty elements are removed.
| Input | Output |
<h1>...</h1> |
... (separate element) |
<p>...</p> |
... (separate element) |
<div>...</div> |
... (separate element) |
<h2>...</h2> |
... (separate element) |
<h3>...</h3> |
... (separate element) |
<h4>...</h4> |
... (separate element) |
<h5>...</h4> |
... (separate element) |
<h6>...</h4> |
... (separate element) |
...\n |
... (separate element) |
<ul>...</ul> |
<itemizedlist>...</itemizedlist> |
<ol>...</ol> |
<orderedlist>...</orderedlist> |
<li>...</li> |
<listitem><para>...</para></listitem> |
<em>...</em> |
<emphasis>...</emphasis> |
<i>...</i> |
<emphasis>...</emphasis> |
<strong>...</strong> |
<emphasis>...</emphasis> |
<b>...</b> |
<emphasis>...</emphasis> |
<sub>...</sub> |
<subscript>...</subscript> |
<sup>...</sup> |
<superscript>...</superscript> |
<pre>...</pre> |
<literalLayout>...</literalLayout> |
<a href="http://example.com">...</a> |
<ulink url="https://example.com"><citetitle>...</citetitle></ulink> |
<code>...</code> |
... (HTML element sanitized) |
<foo>...</foo> |
... (HTML element sanitized) |
<span class="small">...</span> |
... (HTML property sanitized) |
<p class="small">...</p> |
... (HTML property sanitized) |
<img src="file.png"> |
empty string (HTML element sanitized) |
<emphasis>...</emphasis> |
... (DocBook element sanitized)
|
Capture EML with eml <- movepub::write_eml() or read with
EML::read_eml().
Assign output of html_to_docbook() to eml$dataset$abstract$para.
Write EML with EML::write_eml().
Other support functions:
datacite_to_eml(),
get_aphia_id()
html_to_docbook( c( "This is <b>bold</b>.\nParagraph 1\n\nParagraph 2<p></p>", "What follows is a list: <ul><li>Item 1</li><li>Item 2</li></ul>" ) )html_to_docbook( c( "This is <b>bold</b>.\nParagraph 1\n\nParagraph 2<p></p>", "What follows is a list: <ul><li>Item 1</li><li>Item 2</li></ul>" ) )
A sample Movebank dataset with GPS tracking data, formatted as a
Data Package and read by
read_package().
o_asseno_assen
An object of class datapackage (inherits from list) of length 7.
This sample is derived from the Zenodo-deposited dataset Dijkstra et al. (2023), but excludes the acceleration data.
## Not run: # The data in o_assen was created with the code below o_assen <- read_package("https://zenodo.org/records/10053903/files/datapackage.json") |> remove_resource("acceleration") o_assen$title <- paste( "O_ASSEN - Eurasian oystercatchers (Haematopus ostralegus, Haematopodidae),", "breeding in Assen (the Netherlands)" ) o_assen$licenses[[1]]$name <- "CC0-1.0" o_assen$contributors[[1]]$title <- "Vogelwerkgroep Assen" o_assen$contributors[[1]]$role <- "rightsHolder" usethis::use_data(o_assen, overwrite = TRUE) ## End(Not run)## Not run: # The data in o_assen was created with the code below o_assen <- read_package("https://zenodo.org/records/10053903/files/datapackage.json") |> remove_resource("acceleration") o_assen$title <- paste( "O_ASSEN - Eurasian oystercatchers (Haematopus ostralegus, Haematopodidae),", "breeding in Assen (the Netherlands)" ) o_assen$licenses[[1]]$name <- "CC0-1.0" o_assen$contributors[[1]]$title <- "Vogelwerkgroep Assen" o_assen$contributors[[1]]$role <- "rightsHolder" usethis::use_data(o_assen, overwrite = TRUE) ## End(Not run)
Transforms a Data Package with Movebank data to a Darwin Core Archive.
write_dwc( package, directory, dataset_id = package$id, dataset_name = package$title, license = NULL, rights_holder = NULL )write_dwc( package, directory, dataset_id = package$id, dataset_name = package$title, license = NULL, rights_holder = NULL )
package |
A Data Package with Movebank data, as returned by
|
directory |
Path to local directory to write files to. |
dataset_id |
Identifier for the dataset. |
dataset_name |
Title of the dataset. |
license |
License of the dataset. |
rights_holder |
Acronym of the organization owning or managing the rights over the data. |
The resulting files can be uploaded to an IPT for
publication to GBIF and/or OBIS.
A corresponding eml.xml metadata file can be created with write_eml().
See vignette("movepub") for an example.
CSV and meta.xml files written to disk.
And invisibly, a list of data frames with the transformed data.
This function follows recommendations suggested by Peter Desmet, Sarah Davidson, John Wieczorek and others and transforms data to:
An Occurrence core.
A meta.xml file.
Key features of the Darwin Core transformation:
Deployments (animal+tag associations) are parent events, with tag
attachment (a human observation) and GPS positions (machine observations)
as child events.
No information about the parent event is provided other than its ID,
meaning that data can be expressed in an Occurrence core with one row per
observation and parentEventID shared by all occurrences in a deployment.
The tag attachment event often contains metadata about the animal (sex, life stage, comments) and deployment as a whole. Sex and life stage are additionally provided in an Extended Measurement Or Facts extension, where values are mapped to a controlled vocabulary recommended by OBIS.
No event/occurrence is created for the deployment end, since the end date is often undefined, unreliable and/or does not represent an animal occurrence.
Only visible (non-outlier) GPS records that fall within a deployment are
included.
GPS positions are downsampled to the first GPS position per hour, to reduce the size of high-frequency data. It is possible for a deployment to contain no GPS positions, e.g. if the tag malfunctioned right after deployment.
Parameters or metadata are used to set the following record-level terms:
dwc:datasetID: dataset_id, defaulting to package$id.
dwc:datasetName: dataset_name, defaulting to package$title.
dcterms:license: license, defaulting to the first license name
(e.g. CC0-1.0) in package$licenses.
dcterms:rightsHolder: rights_holder, defaulting to the first
contributor in package$contributors with role rightsHolder.
The source data should have the following resources and fields:
reference-data with at least the fields animal-id, animal-taxon,
and tag-id.
Records must have a deploy-on-date to be retained.
gps with at least the fields individual-local-identifier,
tag-local-identifier, and timestamp.
Records must have a location-lat, visible = TRUE and a link with the
reference data to be retained.
Other transformation functions:
write_eml()
write_dwc(o_assen, directory = "my_directory") # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)write_dwc(o_assen, directory = "my_directory") # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)
Transforms the metadata of a published Movebank dataset (with a DOI) to an Ecological Metadata Language (EML) file.
write_eml( doi, directory, contact = NULL, study_id = NULL, derived_paragraph = TRUE )write_eml( doi, directory, contact = NULL, study_id = NULL, derived_paragraph = TRUE )
doi |
DOI of the original dataset, used to get metadata. |
directory |
Path to local directory to write files to. |
contact |
Person to be set as resource contact and metadata provider.
To be provided as a |
study_id |
Identifier of the Movebank study from which the dataset was
derived (e.g. |
derived_paragraph |
If |
The resulting EML file can be uploaded to an IPT
for publication to GBIF and/or OBIS.
A corresponding Darwin Core Archive can be created with write_dwc().
See vignette("movepub") for an example.
eml.xml file written to disk.
And invisibly, an EML::eml object.
Metadata are derived from the original dataset by looking up its doi in
DataCite (example)
and transforming these to EML.
The following properties are set:
title: Original dataset title.
description: Original dataset description.
If derived_paragraph = TRUE a generated paragraph is added, e.g.:
Data have been standardized to Darwin Core using the movepub R package and are downsampled to the first GPS position per hour. The original data are available in Dijkstra et al. (2023, doi:10.5281/zenodo.10053903), a deposit of Movebank study 1605797471.
license: License of the original dataset.
creators: Creators of the original dataset.
contact: contact or first creator of the original dataset.
metadata provider: contact or first creator of the original dataset.
keywords: Keywords of the original dataset.
alternative identifier: DOI of the original dataset. As a result, no new DOI will be created when publishing to GBIF.
external link and alternative identifier: URL created from
study_id or the first derived from related identifier in the original
dataset.
The following properties are not set:
type
subtype
update frequency
publishing organization
geographic coverage
taxonomic coverage
temporal coverage
associated parties
project data
sampling methods
citations
collection data: not applicable.
Other transformation functions:
write_dwc()
(write_eml(doi = "10.5281/zenodo.10053903", directory = "my_directory")) # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)(write_eml(doi = "10.5281/zenodo.10053903", directory = "my_directory")) # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)