Data Package is a simple container format to describe a coherent collection of data (a dataset), including its contributors, licenses, etc.
In this document we use the terms “package” for Data Package, “resource” for Data Resource, “dialect” for Table Dialect, and “schema” for Table Schema.
Frictionless supports reading, manipulating and writing packages.
Much of its functionality is focused on manipulating resources (see
vignette("data-resource")
).
read_package()
reads a package from
datapackage.json
file (path or URL):
library(frictionless)
file <- system.file("extdata", "v1", "datapackage.json", package = "frictionless")
package <- read_package(file)
print.datapackage()
prints a human-readable summary of a
package:
A package is a list, with all the properties that were present in the
datapackage.json
file (e.g. name
,
id
, etc.). Frictionless adds the custom property
"directory"
to support reading data (which is removed when
writing to disk) and extends the class with "datapackage"
to support printing and checking:
attributes(package)
#> $names
#> [1] "name" "id" "licenses" "image" "version" "created"
#> [7] "temporal" "resources" "directory"
#>
#> $class
#> [1] "datapackage" "list"
create_package()
creates a package from scratch or from
an existing package. It adds the required properties and class if those
are missing:
# From scratch
create_package()
#> A Data Package with 0 resources.
#> Use `unclass()` to print the Data Package as a list.
# From an existing package
create_package(package)
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.
check_package()
checks if a package contains the
required properties and class:
invalid_package <- example_package()
invalid_package$resources <- NULL
check_package(invalid_package)
#> Error in `check_package()`:
#> ! `package` must be a Data Package object.
#> ✖ `package` is missing a resources property or it is not a list.
#> ℹ Create a valid Data Package object with `read_package()` or
#> `create_package()`.
You can manipulate the package list, but frictionless does not
provide functions to do that. Use {purrr}
or base R instead
(see vignette("frictionless")
).
Some functions (e.g. unclass()
or append()
)
remove the custom class, creating an invalid package. You can fix this
by calling create_package()
on your package.
Most functions have package
as their first argument and
return package. This allows you to pipe the
functions:
library(dplyr) # Or library(magrittr)
my_package <-
create_package() %>%
add_resource(resource_name = "iris", data = iris) %>%
append(c("title" = "my_package"), after = 0) %>%
create_package() # To add the datapackage class again
my_package
#> A Data Package with 1 resource:
#> • iris
#> Use `unclass()` to print the Data Package as a list.
write_package()
writes a package to disk as a
datapackage.json
file. For some resources, it also writes
the data files. See the function documentation and
vignette("data-resource")
for details.
resources
is required. It is used by resources()
and many other
functions. check_package()
returns an error if it is
missing.
profile
is ignored by read_package()
and not set (to
e.g. "tabular-data-package"
) by
create_package()
.
name
is ignored by read_package()
and not set by
create_package()
.
id
is ignored by read_package()
and not set by
create_package()
. print.datapackage()
adds an
extra sentence when id
is a URL (like a DOI):
package <- example_package()
package$id <- "https://doi.org/10.5281/zenodo.10053702/"
package
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> For more information, see <https://doi.org/10.5281/zenodo.10053702/>.
#> Use `unclass()` to print the Data Package as a list.
licenses
is ignored by read_package()
and not set by
create_package()
.
title
is ignored by read_package()
and not set by
create_package()
.
description
is ignored by read_package()
and not set by
create_package()
.
homepage
is ignored by read_package()
and not set by
create_package()
.
image
is ignored by read_package()
and not set by
create_package()
.
version
is ignored by read_package()
and not set by
create_package()
.
created
is ignored by read_package()
and not set by
create_package()
.
keywords
is ignored by read_package()
and not set by
create_package()
.
contributors
is ignored by read_package()
and not set by
create_package()
.
sources
is ignored by read_package()
and not set by
create_package()
.