Title: | Read and Write Frictionless Data Packages |
---|---|
Description: | Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets. |
Authors: | Peter Desmet [aut, cre] (<https://orcid.org/0000-0002-8442-8025>, Research Institute for Nature and Forest (INBO)), Damiano Oldoni [aut] (<https://orcid.org/0000-0003-3445-7562>, Research Institute for Nature and Forest (INBO)), Pieter Huybrechts [aut] (<https://orcid.org/0000-0002-6658-6062>, Research Institute for Nature and Forest (INBO)), Sanne Govaert [aut] (<https://orcid.org/0000-0002-8939-1305>, Research Institute for Nature and Forest (INBO)), Kyle Husmann [ctb] (<https://orcid.org/0000-0001-9875-8976>, Pennsylvania State University), Research Institute for Nature and Forest (INBO) [cph] (https://www.vlaanderen.be/inbo/en-gb/), Research Foundation - Flanders [fnd] (https://lifewatch.be), Beatriz Milz [rev] , João Martins [rev] |
Maintainer: | Peter Desmet <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.0.9000 |
Built: | 2024-10-26 06:12:35 UTC |
Source: | https://github.com/frictionlessdata/frictionless-r |
Adds a Data Resource to a Data Package.
The resource will be a Tabular Data Resource.
The resource name can only contain lowercase alphanumeric characters plus
.
, -
and _
.
add_resource( package, resource_name, data, schema = NULL, replace = FALSE, delim = ",", ... )
add_resource( package, resource_name, data, schema = NULL, replace = FALSE, delim = ",", ... )
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
data |
Data to attach, either a data frame or path(s) to CSV file(s):
|
schema |
Either a list, or path or URL to a JSON file describing a Table
Schema for the |
replace |
If |
delim |
Single character used to separate the fields in the CSV file(s),
e.g. |
... |
Additional metadata properties
to add to the resource, e.g. |
See vignette("data-resource")
(and to a lesser extend
vignette("table-dialect")
) to learn how this function implements the
Data Package standard.
package
with one additional resource.
Other edit functions:
remove_resource()
# Load the example Data Package package <- example_package() # List the resources resources(package) # Create a data frame df <- data.frame( multimedia_id = c( "aed5fa71-3ed4-4284-a6ba-3550d1a4de8d", "da81a501-8236-4cbd-aa95-4bc4b10a05df" ), x = c(718, 748), y = c(860, 900) ) # Add the resource "positions" from the data frame package <- add_resource(package, "positions", data = df) # Add the resource "positions_with_schema", with a user-defined schema and title my_schema <- create_schema(df) package <- add_resource( package, resource_name = "positions_with_schema", data = df, schema = my_schema, title = "Positions with schema" ) # Replace the resource "observations" with a file-based resource (2 TSV files) path_1 <- system.file("extdata", "v1", "observations_1.tsv", package = "frictionless") path_2 <- system.file("extdata", "v1", "observations_2.tsv", package = "frictionless") package <- add_resource( package, resource_name = "observations", data = c(path_1, path_2), replace = TRUE, delim = "\t" ) # List the resources ("positions" and "positions_with_schema" added) resources(package)
# Load the example Data Package package <- example_package() # List the resources resources(package) # Create a data frame df <- data.frame( multimedia_id = c( "aed5fa71-3ed4-4284-a6ba-3550d1a4de8d", "da81a501-8236-4cbd-aa95-4bc4b10a05df" ), x = c(718, 748), y = c(860, 900) ) # Add the resource "positions" from the data frame package <- add_resource(package, "positions", data = df) # Add the resource "positions_with_schema", with a user-defined schema and title my_schema <- create_schema(df) package <- add_resource( package, resource_name = "positions_with_schema", data = df, schema = my_schema, title = "Positions with schema" ) # Replace the resource "observations" with a file-based resource (2 TSV files) path_1 <- system.file("extdata", "v1", "observations_1.tsv", package = "frictionless") path_2 <- system.file("extdata", "v1", "observations_2.tsv", package = "frictionless") package <- add_resource( package, resource_name = "observations", data = c(path_1, path_2), replace = TRUE, delim = "\t" ) # List the resources ("positions" and "positions_with_schema" added) resources(package)
Check if an object is a Data Package object with the required properties.
check_package(package)
check_package(package)
package |
Data Package object, as returned by |
package
invisibly or an error.
# Load the example Data Package package <- example_package() # Check if the Data Package is valid (invisible return) check_package(package)
# Load the example Data Package package <- example_package() # Check if the Data Package is valid (invisible return) check_package(package)
Initiates a Data Package object, either from scratch or from an existing list. This Data Package object is a list with the following characteristics:
A datapackage
subclass.
All properties of the original descriptor
.
A resources
property, set to an empty list if undefined.
A directory
property, set to "."
for the current directory if
undefined.
It is used as the base path to access resources with read_resource()
.
create_package(descriptor = NULL)
create_package(descriptor = NULL)
descriptor |
List to be made into a Data Package object. If undefined, an empty Data Package will be created from scratch. |
See vignette("data-package")
to learn how this function implements the
Data Package standard.
check_package()
is automatically called on the created package to make sure
it is valid.
A Data Package object.
Other create functions:
create_schema()
# Create a Data Package package <- create_package() package # See the structure of the (empty) Data Package str(package)
# Create a Data Package package <- create_package() package # See the structure of the (empty) Data Package str(package)
Creates a Table Schema for a data frame, listing all column names and types as field names and (converted) types.
create_schema(data)
create_schema(data)
data |
A data frame. |
See vignette("table-schema")
to learn how this function implements the
Data Package standard.
List describing a Table Schema.
Other create functions:
create_package()
# Create a data frame df <- data.frame( id = c(as.integer(1), as.integer(2)), timestamp = c( as.POSIXct("2020-03-01 12:00:00", tz = "EET"), as.POSIXct("2020-03-01 18:45:00", tz = "EET") ), life_stage = factor(c("adult", "adult"), levels = c("adult", "juvenile")) ) # Create a Table Schema from the data frame schema <- create_schema(df) str(schema)
# Create a data frame df <- data.frame( id = c(as.integer(1), as.integer(2)), timestamp = c( as.POSIXct("2020-03-01 12:00:00", tz = "EET"), as.POSIXct("2020-03-01 18:45:00", tz = "EET") ), life_stage = factor(c("adult", "adult"), levels = c("adult", "juvenile")) ) # Create a Table Schema from the data frame schema <- create_schema(df) str(schema)
Reads the example Data Package included in frictionless
.
This dataset is used in examples, vignettes, and tests and contains dummy
camera trap data organized in 3 Data Resources:
deployments
: one local data file referenced in
"path": "deployments.csv"
.
observations
: two local data files referenced in
"path": ["observations_1.tsv", "observations_2.tsv"]
.
media
: inline data stored in data
.
example_package(version = "1.0")
example_package(version = "1.0")
version |
Data Package standard version. |
The example Data Package is available in two versions:
1.0
: specified as a Data Package v1.
2.0
: specified as a Data Package v2.
A Data Package object, see create_package()
.
# Version 1 example_package() # Version 2 example_package(version = "2.0")
# Version 1 example_package() # Version 2 example_package(version = "2.0")
Returns the Table Schema of a Data Resource (in a Data Package), i.e. the
content of its schema
property, describing the resource's fields, data
types, relationships, and missing values.
The resource must be a Tabular Data Resource.
get_schema(package, resource_name)
get_schema(package, resource_name)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
See vignette("table-schema")
to learn more about Table Schema.
List describing a Table Schema.
# Load the example Data Package package <- example_package() # Get the Table Schema for the resource "observations" schema <- get_schema(package, "observations") str(schema)
# Load the example Data Package package <- example_package() # Get the Table Schema for the resource "observations" schema <- get_schema(package, "observations") str(schema)
Prints a human-readable summary of a Data Package, including its resources
and a link to more information (if provided in package$id
).
## S3 method for class 'datapackage' print(x, ...)
## S3 method for class 'datapackage' print(x, ...)
x |
Data Package object, as returned by |
... |
Further arguments, they are ignored by this function. |
print()
with a summary of the Data Package object.
# Load the example Data Package package <- example_package() # Print a summary of the Data Package package # Or print(package)
# Load the example Data Package package <- example_package() # Print a summary of the Data Package package # Or print(package)
datapackage.json
)Reads information from a datapackage.json
file, i.e. the descriptor file that
describes the Data Package metadata and its Data Resources.
read_package(file = "datapackage.json")
read_package(file = "datapackage.json")
file |
Path or URL to a |
See vignette("data-package")
to learn how this function implements the
Data Package standard.
A Data Package object, see create_package()
.
Other read functions:
read_resource()
,
resources()
# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Access the Data Package properties package$name package$created
# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Access the Data Package properties package$name package$created
Reads data from a Data Resource (in a Data Package) into a tibble (a
Tidyverse data frame).
The resource must be a Tabular Data Resource.
The function uses readr::read_delim()
to read CSV files, passing the
resource properties path
, CSV dialect, column names, data types, etc.
Column names are taken from the provided Table Schema (schema
), not from
the header in the CSV file(s).
read_resource(package, resource_name, col_select = NULL)
read_resource(package, resource_name, col_select = NULL)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
col_select |
Character vector of the columns to include in the result, in the order provided. Selecting columns can improve read speed. |
See vignette("data-resource")
, vignette("table-dialect")
and
vignette("table-schema")
to learn how this function implements the
Data Package standard.
A tibble::tibble()
with the Data Resource's tabular data.
If there are parsing problems, a warning will alert you.
You can retrieve the full details by calling problems()
on your data
frame.
Other read functions:
read_package()
,
resources()
# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Read data from the resource "observations" read_resource(package, "observations") # The above tibble is merged from 2 files listed in the resource path package$resources[[2]]$path # The column names and types are derived from the resource schema purrr::map_chr(package$resources[[2]]$schema$fields, "name") purrr::map_chr(package$resources[[2]]$schema$fields, "type") # Read data from the resource "deployments" with column selection read_resource(package, "deployments", col_select = c("latitude", "longitude"))
# Read a datapackage.json file package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Read data from the resource "observations" read_resource(package, "observations") # The above tibble is merged from 2 files listed in the resource path package$resources[[2]]$path # The column names and types are derived from the resource schema purrr::map_chr(package$resources[[2]]$schema$fields, "name") purrr::map_chr(package$resources[[2]]$schema$fields, "type") # Read data from the resource "deployments" with column selection read_resource(package, "deployments", col_select = c("latitude", "longitude"))
Removes a Data Resource from a Data Package, i.e. it removes one of the
described resources
.
remove_resource(package, resource_name)
remove_resource(package, resource_name)
package |
Data Package object, as returned by |
resource_name |
Name of the Data Resource. |
package
with one fewer resource.
Other edit functions:
add_resource()
# Load the example Data Package package <- example_package() # List the resources resources(package) # Remove the resource "observations" package <- remove_resource(package, "observations") # List the resources ("observations" removed) resources(package)
# Load the example Data Package package <- example_package() # List the resources resources(package) # Remove the resource "observations" package <- remove_resource(package, "observations") # List the resources ("observations" removed) resources(package)
Lists the names of the Data Resources included in a Data Package.
resources(package)
resources(package)
package |
Data Package object, as returned by |
Character vector with the Data Resource names.
Other read functions:
read_package()
,
read_resource()
# Load the example Data Package package <- example_package() # List the resources resources(package)
# Load the example Data Package package <- example_package() # List the resources resources(package)
Writes a Data Package and its related Data Resources to disk as a
datapackage.json
and CSV files.
Already existing CSV files of the same name will not be overwritten.
The function can also be used to download a Data Package in its entirety.
The Data Resources are handled as follows:
Resource path
has at least one local path (e.g. deployments.csv
):
CSV files are copied or downloaded to directory
and path
points to new
location of file(s).
Resource path
has only URL(s): resource stays as is.
Resource has inline data
originally: resource stays as is.
Resource has inline data
as result of adding data with add_resource()
:
data are written to a CSV file using readr::write_csv()
, path
points to
location of file, data
property is removed.
Use compress = TRUE
to gzip those CSV files.
write_package(package, directory, compress = FALSE)
write_package(package, directory, compress = FALSE)
package |
Data Package object, as returned by |
directory |
Path to local directory to write files to. |
compress |
If |
package
invisibly, as written to file.
# Load the example Data Package from disk package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Write the (unchanged) Data Package to disk write_package(package, directory = "my_directory") # Check files list.files("my_directory") # No files written for the "observations" resource, since those are all URLs. # No files written for the "media" resource, since it has inline data. # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)
# Load the example Data Package from disk package <- read_package( system.file("extdata", "v1", "datapackage.json", package = "frictionless") ) package # Write the (unchanged) Data Package to disk write_package(package, directory = "my_directory") # Check files list.files("my_directory") # No files written for the "observations" resource, since those are all URLs. # No files written for the "media" resource, since it has inline data. # Clean up (don't do this if you want to keep your files) unlink("my_directory", recursive = TRUE)