--- title: "Data Package" author: "Peter Desmet" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` [Data Package](https://specs.frictionlessdata.io/data-package/) is a simple container format to describe a coherent collection of data (a dataset), including its contributors, licenses, etc. ::: {.callout-info} In this document we use the terms "package" for Data Package, "resource" for Data Resource, "dialect" for Table Dialect, and "schema" for Table Schema. ::: ## General implementation Frictionless supports reading, manipulating and writing packages. Much of its functionality is focused on manipulating resources (see `vignette("data-resource")`). ### Read `read_package()` reads a package from `datapackage.json` file (path or URL): ```{r} library(frictionless) file <- system.file("extdata", "v1", "datapackage.json", package = "frictionless") package <- read_package(file) ``` `print.datapackage()` prints a human-readable summary of a package: ```{r} package ``` ### Manipulate A package is a list, with all the properties that were present in the `datapackage.json` file (e.g. `name`, `id`, etc.). Frictionless adds the custom property `"directory"` to support reading data (which is removed when writing to disk) and extends the class with `"datapackage"` to support printing and checking: ```{r} attributes(package) ``` `create_package()` creates a package from scratch or from an existing package. It adds the required properties and class if those are missing: ```{r} # From scratch create_package() # From an existing package create_package(package) ``` `check_package()` checks if a package contains the required properties and class: ```{r, error = TRUE, purl = FALSE} invalid_package <- example_package() invalid_package$resources <- NULL check_package(invalid_package) ``` You can manipulate the package list, but frictionless does not provide functions to do that. Use `{purrr}` or base R instead (see `vignette("frictionless")`). ::: {.callout-warning} Some functions (e.g. `unclass()` or `append()`) remove the custom class, creating an invalid package. You can fix this by calling `create_package()` on your package. ::: Most functions have `package` as their first argument and return package. This allows you to **pipe the functions**: ```{r, message = FALSE} library(dplyr) # Or library(magrittr) my_package <- create_package() %>% add_resource(resource_name = "iris", data = iris) %>% append(c("title" = "my_package"), after = 0) %>% create_package() # To add the datapackage class again my_package ``` ### Write `write_package()` writes a package to disk as a `datapackage.json` file. For some resources, it also writes the data files. See the function documentation and `vignette("data-resource")` for details. ## Properties implementation ### resources [`resources`](https://specs.frictionlessdata.io/data-package/#resource-information) is required. It is used by `resources()` and many other functions. `check_package()` returns an error if it is missing. ### profile [`profile`](https://specs.frictionlessdata.io/data-package/#profile) is ignored by `read_package()` and not set (to e.g. `"tabular-data-package"`) by `create_package()`. ### name [`name`](https://specs.frictionlessdata.io/data-package/#name) is ignored by `read_package()` and not set by `create_package()`. ### id [`id`](https://specs.frictionlessdata.io/data-package/#id) is ignored by `read_package()` and not set by `create_package()`. `print.datapackage()` adds an extra sentence when `id` is a URL (like a DOI): ```{r} package <- example_package() package$id <- "https://doi.org/10.5281/zenodo.10053702/" package ``` ### licenses [`licenses`](https://specs.frictionlessdata.io/data-package/#licenses) is ignored by `read_package()` and not set by `create_package()`. ### title [`title`](https://specs.frictionlessdata.io/data-package/#title) is ignored by `read_package()` and not set by `create_package()`. ### description [`description`](https://specs.frictionlessdata.io/data-package/#description) is ignored by `read_package()` and not set by `create_package()`. ### homepage [`homepage`](https://specs.frictionlessdata.io/data-package/#homepage) is ignored by `read_package()` and not set by `create_package()`. ### image [`image`](https://specs.frictionlessdata.io/data-package/#image) is ignored by `read_package()` and not set by `create_package()`. ### version [`version`](https://specs.frictionlessdata.io/data-package/#version) is ignored by `read_package()` and not set by `create_package()`. ### created [`created`](https://specs.frictionlessdata.io/data-package/#created) is ignored by `read_package()` and not set by `create_package()`. ### keywords [`keywords`](https://specs.frictionlessdata.io/data-package/#keywords) is ignored by `read_package()` and not set by `create_package()`. ### contributors [`contributors`](https://specs.frictionlessdata.io/data-package/#contributors) is ignored by `read_package()` and not set by `create_package()`. ### sources [`sources`](https://specs.frictionlessdata.io/data-package/#sources) is ignored by `read_package()` and not set by `create_package()`.