Package 'git2rdata' reference manual

Title:	Store and Retrieve Data.frames in a Git Repository
Description:	The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette("plain_text", package = "git2rdata"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette("version_control", package = "git2rdata"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette("workflow", package = "git2rdata") gives a toy example. 4) vignette("efficiency", package = "git2rdata") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.
Authors:	Thierry Onkelinx [aut, cre] (<https://orcid.org/0000-0001-8804-4216>, Research Institute for Nature and Forest (INBO)), Floris Vanderhaeghe [ctb] (<https://orcid.org/0000-0002-6378-6229>, Research Institute for Nature and Forest (INBO)), Peter Desmet [ctb] (<https://orcid.org/0000-0002-8442-8025>, Research Institute for Nature and Forest (INBO)), Els Lommelen [ctb] (<https://orcid.org/0000-0002-3481-5684>, Research Institute for Nature and Forest (INBO)), Research Institute for Nature and Forest (INBO) [cph, fnd]
Maintainer:	Thierry Onkelinx <[email protected]>
License:	GPL-3
Version:	0.5.0
Built:	2025-01-24 19:20:28 UTC
Source:	https://github.com/ropensci/git2rdata

Re-exported Function From `git2r`

Description

See commit in git2r.

Create a Data Package for a directory of CSV files

Description

Create a datapackage.json file for a directory of CSV files. The function will look for all .csv files in the directory and its subdirectories. It will then create a datapackage.json file with the metadata of each CSV file.

Usage

data_package(path = ".")
data_package(path = ".")

Arguments

path

the directory in which to create the datapackage.json file.

Display metadata for a `git2rdata` object

Description

Display metadata for a git2rdata object

Usage

display_metadata(x, minimal = FALSE)
display_metadata(x, minimal = FALSE)

Arguments

`x`	a `git2rdata` object
`minimal`	logical, if `TRUE` only a message is displayed

Check Whether a Git2rdata Object is Valid.

Description

A valid git2rdata object has valid metadata.

Usage

is_git2rdata(file, root = ".", message = c("none", "warning", "error"))
is_git2rdata(file, root = ".", message = c("none", "warning", "error"))

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`message`	a single value indicating the type of messages on top of the logical value. `"none"`: no messages, `"warning"`: issue a warning in case of an invalid metadata file. `"error"`: an invalid metadata file results in an error. Defaults to `"none"`.

Value

A logical value. TRUE in case of a valid git2rdata object. Otherwise FALSE.

Examples

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a file
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# check the stored file
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# Remove the metadata from the existing git2rdata object. Then it stops
# being a git2rdata object.
junk <- file.remove(file.path(root, "iris.yml"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# recreate the file and remove the data and keep the metadata. It stops being
# a git2rdata object, but the metadata remains valid.
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
junk <- file.remove(file.path(root, "iris.tsv"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)
# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a file
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# check the stored file
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# Remove the metadata from the existing git2rdata object. Then it stops
# being a git2rdata object.
junk <- file.remove(file.path(root, "iris.yml"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# recreate the file and remove the data and keep the metadata. It stops being
# a git2rdata object, but the metadata remains valid.
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
junk <- file.remove(file.path(root, "iris.tsv"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

Check Whether a Git2rdata Object Has Valid Metadata.

Description

Valid metadata is a file with .yml extension. It has a top level item ..generic. This item contains git2rdata (the version number), hash (a hash on the metadata) and data_hash (a hash on the data file). The version number must be the current version.

Usage

is_git2rmeta(file, root = ".", message = c("none", "warning", "error"))
is_git2rmeta(file, root = ".", message = c("none", "warning", "error"))

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`message`	a single value indicating the type of messages on top of the logical value. `"none"`: no messages, `"warning"`: issue a warning in case of an invalid metadata file. `"error"`: an invalid metadata file results in an error. Defaults to `"none"`.

Value

A logical value. TRUE in case of a valid metadata file. Otherwise FALSE.

Examples

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a file
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# check the stored file
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# Remove the metadata from the existing git2rdata object. Then it stops
# being a git2rdata object.
junk <- file.remove(file.path(root, "iris.yml"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# recreate the file and remove the data and keep the metadata. It stops being
# a git2rdata object, but the metadata remains valid.
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
junk <- file.remove(file.path(root, "iris.tsv"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)
# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a file
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# check the stored file
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# Remove the metadata from the existing git2rdata object. Then it stops
# being a git2rdata object.
junk <- file.remove(file.path(root, "iris.yml"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

# recreate the file and remove the data and keep the metadata. It stops being
# a git2rdata object, but the metadata remains valid.
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
junk <- file.remove(file.path(root, "iris.tsv"))
is_git2rmeta("iris", root)
is_git2rdata("iris", root)

List Available Git2rdata Files Containing Data

Description

The function returns the names of all valid git2rdata objects. This implies .tsv files with a matching valid metadata file (.yml). Invalid metadata files result in a warning. The function ignores valid metadata files without matching raw data (.tsv).

Usage

list_data(root = ".", path = ".", recursive = TRUE)
list_data(root = ".", path = ".", recursive = TRUE)

Arguments

`root`	the `root` of the repository. Either a path or a `git-repository`
`path`	relative `path` from the `root`. Defaults to the `root`
`recursive`	logical. Should the listing recurse into directories?

Value

A character vector of git2rdata object names, including their relative path.

Examples

## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a dataframe as git2rdata object. Capture the result to minimise
# screen output
junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# write a standard tab separate file (non git2rdata object)
write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t")
# write a YAML file
yml <- list(
  authors = list(
   "Research Institute for Nature and Forest" = list(
       href = "https://www.inbo.be/en")))
yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml"))

# list the git2rdata objects
list_data(root)
# list the files
list.files(root, recursive = TRUE)

# remove all .tsv files from valid git2rdata objects
rm_data(root, path = ".")
# check the removal of the .tsv file
list.files(root, recursive = TRUE)
list_data(root)

# remove dangling git2rdata metadata files
prune_meta(root, path = ".")
# check the removal of the metadata
list.files(root, recursive = TRUE)
list_data(root)


## on git repo

# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe
write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE)
# check that the dataframe is stored
status(repo)
list_data(repo)

# commit the current version and check the git repo
commit(repo, "add iris data", session = TRUE)
status(repo)

# remove the data files from the repo
rm_data(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

# remove dangling metadata
prune_meta(repo, path = ".")
# check the removal
list_data(repo)
status(repo)
## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a dataframe as git2rdata object. Capture the result to minimise
# screen output
junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# write a standard tab separate file (non git2rdata object)
write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t")
# write a YAML file
yml <- list(
  authors = list(
   "Research Institute for Nature and Forest" = list(
       href = "https://www.inbo.be/en")))
yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml"))

# list the git2rdata objects
list_data(root)
# list the files
list.files(root, recursive = TRUE)

# remove all .tsv files from valid git2rdata objects
rm_data(root, path = ".")
# check the removal of the .tsv file
list.files(root, recursive = TRUE)
list_data(root)

# remove dangling git2rdata metadata files
prune_meta(root, path = ".")
# check the removal of the metadata
list.files(root, recursive = TRUE)
list_data(root)


## on git repo

# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe
write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE)
# check that the dataframe is stored
status(repo)
list_data(repo)

# commit the current version and check the git repo
commit(repo, "add iris data", session = TRUE)
status(repo)

# remove the data files from the repo
rm_data(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

# remove dangling metadata
prune_meta(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

Optimize an Object for Storage as Plain Text and Add Metadata

Description

Prepares a vector for storage. When relevant, meta() optimizes the object for storage by changing the format to one which needs less characters. The metadata stored in the meta attribute, contains all required information to back-transform the optimized format into the original format.

In case of a data.frame, meta() applies itself to each of the columns. The meta attribute becomes a named list containing the metadata for each column plus an additional ..generic element. ..generic is a reserved name for the metadata and not allowed as column name in a data.frame.

write_vc() uses this function to prepare a dataframe for storage. Existing metadata is passed through the optional old argument. This argument intended for internal use.

Usage

meta(x, ..., digits)

## S3 method for class 'character'
meta(x, na = "NA", optimize = TRUE, ...)

## S3 method for class 'factor'
meta(x, optimize = TRUE, na = "NA", index, strict = TRUE, ...)

## S3 method for class 'logical'
meta(x, optimize = TRUE, ...)

## S3 method for class 'POSIXct'
meta(x, optimize = TRUE, ...)

## S3 method for class 'Date'
meta(x, optimize = TRUE, ...)

## S3 method for class 'data.frame'
meta(
  x,
  optimize = TRUE,
  na = "NA",
  sorting,
  strict = TRUE,
  split_by = character(0),
  ...,
  digits
)
meta(x, ..., digits)

## S3 method for class 'character'
meta(x, na = "NA", optimize = TRUE, ...)

## S3 method for class 'factor'
meta(x, optimize = TRUE, na = "NA", index, strict = TRUE, ...)

## S3 method for class 'logical'
meta(x, optimize = TRUE, ...)

## S3 method for class 'POSIXct'
meta(x, optimize = TRUE, ...)

## S3 method for class 'Date'
meta(x, optimize = TRUE, ...)

## S3 method for class 'data.frame'
meta(
  x,
  optimize = TRUE,
  na = "NA",
  sorting,
  strict = TRUE,
  split_by = character(0),
  ...,
  digits
)

Arguments

`x`	the vector.
`...`	further arguments to the methods.
`digits`	The number of significant digits of the smallest absolute value. The function applies the rounding automatically. Only relevant for numeric variables. Either a single positive integer or a named vector where the names link to the variables in the `data.frame`. Defaults to `6` with a warning.
`na`	the string to use for missing values in the data.
`optimize`	If `TRUE`, recode the data to get smaller text files. If `FALSE`, `meta()` converts the data to character. Defaults to `TRUE`.
`index`	An optional named vector with existing factor indices. The names must match the existing factor levels. Unmatched levels from `x` will get new indices.
`strict`	What to do when the metadata changes. `strict = FALSE` overwrites the data and the metadata with a warning listing the changes, `strict = TRUE` returns an error and leaves the data and metadata as is. Defaults to `TRUE`.
`sorting`	an optional vector of column names defining which columns to use for sorting `x` and in what order to use them. The default empty `sorting` yields a warning. Add `sorting` to avoid this warning. Strongly recommended in combination with version control. See `vignette("efficiency", package = "git2rdata")` for an illustration of the importance of sorting.
`split_by`	An optional vector of variables name to split the text files. This creates a separate file for every combination. We prepend these variables to the vector of `sorting` variables.

Value

the optimized vector x with meta attribute.

Note

The default order of factor levels depends on the current locale. See Comparison for more details on that. The same code on a different locale might result in a different sorting. meta() ignores, with a warning, any change in the order of factor levels. Add strict = FALSE to enforce the new order of factor levels.

Examples

meta(c(NA, "'NA'", '"NA"', "abc\tdef", "abc\ndef"))
meta(1:3)
meta(seq(1, 3, length = 4), digits = 6)
meta(factor(c("b", NA, "NA"), levels = c("NA", "b", "c")))
meta(factor(c("b", NA, "a"), levels = c("a", "b", "c")), optimize = FALSE)
meta(factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE))
meta(
  factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE),
  optimize = FALSE
)
meta(c(FALSE, NA, TRUE))
meta(c(FALSE, NA, TRUE), optimize = FALSE)
meta(complex(real = c(1, NA, 2), imaginary = c(3, NA, -1)))
meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET"))
meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET"), optimize = FALSE)
meta(as.Date("2019-02-01"))
meta(as.Date("2019-02-01"), optimize = FALSE)
meta(c(NA, "'NA'", '"NA"', "abc\tdef", "abc\ndef"))
meta(1:3)
meta(seq(1, 3, length = 4), digits = 6)
meta(factor(c("b", NA, "NA"), levels = c("NA", "b", "c")))
meta(factor(c("b", NA, "a"), levels = c("a", "b", "c")), optimize = FALSE)
meta(factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE))
meta(
  factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE),
  optimize = FALSE
)
meta(c(FALSE, NA, TRUE))
meta(c(FALSE, NA, TRUE), optimize = FALSE)
meta(complex(real = c(1, NA, 2), imaginary = c(3, NA, -1)))
meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET"))
meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET"), optimize = FALSE)
meta(as.Date("2019-02-01"))
meta(as.Date("2019-02-01"), optimize = FALSE)

Print method for `git2rdata` objects.

Description

Prints the data and the description of the columns when available.

Usage

## S3 method for class 'git2rdata'
print(x, ...)
## S3 method for class 'git2rdata'
print(x, ...)

Arguments

`x`	a `git2rdata` object
`...`	additional arguments passed to `print`

Prune Metadata Files

Description

Removes all valid metadata (.yml files) from the path when they don't have accompanying data (.tsv file). Invalid metadata triggers a warning without removing the metadata file.

Use this function with caution since it will remove all valid metadata files without asking for confirmation. We strongly recommend to use this function on files under version control. See vignette("workflow", package = "git2rdata") for some examples on how to use this.

Usage

prune_meta(root = ".", path = NULL, recursive = TRUE, ...)

## S3 method for class 'git_repository'
prune_meta(root, path = NULL, recursive = TRUE, ..., stage = FALSE)
prune_meta(root = ".", path = NULL, recursive = TRUE, ...)

## S3 method for class 'git_repository'
prune_meta(root, path = NULL, recursive = TRUE, ..., stage = FALSE)

Arguments

`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`path`	the directory in which to clean all the data files. The directory is relative to `root`.
`recursive`	remove files in subdirectories too.
`...`	parameters used in some methods
`stage`	stage the changes after removing the files. Defaults to `FALSE`.

Value

returns invisibly a vector of removed files names. The paths are relative to root.

Examples

## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a dataframe as git2rdata object. Capture the result to minimise
# screen output
junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# write a standard tab separate file (non git2rdata object)
write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t")
# write a YAML file
yml <- list(
  authors = list(
   "Research Institute for Nature and Forest" = list(
       href = "https://www.inbo.be/en")))
yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml"))

# list the git2rdata objects
list_data(root)
# list the files
list.files(root, recursive = TRUE)

# remove all .tsv files from valid git2rdata objects
rm_data(root, path = ".")
# check the removal of the .tsv file
list.files(root, recursive = TRUE)
list_data(root)

# remove dangling git2rdata metadata files
prune_meta(root, path = ".")
# check the removal of the metadata
list.files(root, recursive = TRUE)
list_data(root)


## on git repo

# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe
write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE)
# check that the dataframe is stored
status(repo)
list_data(repo)

# commit the current version and check the git repo
commit(repo, "add iris data", session = TRUE)
status(repo)

# remove the data files from the repo
rm_data(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

# remove dangling metadata
prune_meta(repo, path = ".")
# check the removal
list_data(repo)
status(repo)
## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a dataframe as git2rdata object. Capture the result to minimise
# screen output
junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# write a standard tab separate file (non git2rdata object)
write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t")
# write a YAML file
yml <- list(
  authors = list(
   "Research Institute for Nature and Forest" = list(
       href = "https://www.inbo.be/en")))
yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml"))

# list the git2rdata objects
list_data(root)
# list the files
list.files(root, recursive = TRUE)

# remove all .tsv files from valid git2rdata objects
rm_data(root, path = ".")
# check the removal of the .tsv file
list.files(root, recursive = TRUE)
list_data(root)

# remove dangling git2rdata metadata files
prune_meta(root, path = ".")
# check the removal of the metadata
list.files(root, recursive = TRUE)
list_data(root)


## on git repo

# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe
write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE)
# check that the dataframe is stored
status(repo)
list_data(repo)

# commit the current version and check the git repo
commit(repo, "add iris data", session = TRUE)
status(repo)

# remove the data files from the repo
rm_data(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

# remove dangling metadata
prune_meta(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

Re-exported Function From `git2r`

Description

See pull in git2r.

Re-exported Function From `git2r`

Description

See push in git2r.

Read a Git2rdata Object from Disk

Description

read_vc() handles git2rdata objects stored by write_vc(). It reads and verifies the metadata file (.yml). Then it reads and verifies the raw data. The last step is back-transforming any transformation done by meta() to return the data.frame as stored by write_vc().

read_vc() is an S3 generic on root which currently handles "character" (a path) and "git-repository" (from git2r). S3 methods for other version control system could be added.

Usage

read_vc(file, root = ".")
read_vc(file, root = ".")

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).

Value

The data.frame with the file names and hashes as attributes. It has the additional class "git2rdata" to support extra methods to display the descriptions.

Examples

## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# write a dataframe to the directory
write_vc(
  iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
# check that a data file (.tsv) and a metadata file (.yml) exist.
list.files(root, recursive = TRUE)
# read the git2rdata object from the directory
read_vc("iris", root)

# store a new version with different observations but the same metadata
write_vc(iris[1:5, ], "iris", root)
list.files(root, recursive = TRUE)
# Removing a column requires version requires new metadata.
# Add strict = FALSE to override the existing metadata.
write_vc(
  iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE
)
list.files(root, recursive = TRUE)
# storing the orignal version again requires another update of the metadata
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE)
list.files(root, recursive = TRUE)
# optimize = FALSE stores the data more verbose. This requires larger files.
write_vc(
  iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE
)
list.files(root, recursive = TRUE)



## on git repo using a git2r::git-repository

# initialise a git repo using the git2r package
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe in git repo.
write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length")
# This git2rdata object is not staged by default.
status(repo)
# read a dataframe from a git repo
read_vc("iris", repo)

# store a new version in the git repo and stage it in one go
write_vc(iris[1:5, ], "iris", repo, stage = TRUE)
status(repo)

# store a verbose version in a different gir2data object
write_vc(
  iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE
)
status(repo)
## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# write a dataframe to the directory
write_vc(
  iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
# check that a data file (.tsv) and a metadata file (.yml) exist.
list.files(root, recursive = TRUE)
# read the git2rdata object from the directory
read_vc("iris", root)

# store a new version with different observations but the same metadata
write_vc(iris[1:5, ], "iris", root)
list.files(root, recursive = TRUE)
# Removing a column requires version requires new metadata.
# Add strict = FALSE to override the existing metadata.
write_vc(
  iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE
)
list.files(root, recursive = TRUE)
# storing the orignal version again requires another update of the metadata
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE)
list.files(root, recursive = TRUE)
# optimize = FALSE stores the data more verbose. This requires larger files.
write_vc(
  iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE
)
list.files(root, recursive = TRUE)



## on git repo using a git2r::git-repository

# initialise a git repo using the git2r package
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe in git repo.
write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length")
# This git2rdata object is not staged by default.
status(repo)
# read a dataframe from a git repo
read_vc("iris", repo)

# store a new version in the git repo and stage it in one go
write_vc(iris[1:5, ], "iris", repo, stage = TRUE)
status(repo)

# store a verbose version in a different gir2data object
write_vc(
  iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE
)
status(repo)

Retrieve the Most Recent File Change

Description

Retrieve the most recent commit that added or updated a file or git2rdata object. This does not imply that file still exists at the current HEAD as it ignores the deletion of files.

Use this information to document the current version of file or git2rdata object in an analysis. Since it refers to the most recent change of this file, it remains unchanged by committing changes to other files. You can also use it to track if data got updated, requiring an analysis to be rerun. See vignette("workflow", package = "git2rdata").

Usage

recent_commit(file, root, data = FALSE)
recent_commit(file, root, data = FALSE)

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`.
`data`	does `file` refers to a data object (`TRUE`) or to a file (`FALSE`)? Defaults to `FALSE`.

Value

a data.frame with commit, author and when for the most recent commit that adds op updates the file.

Examples

# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# write and commit a first dataframe
# store the output of write_vc() minimize screen output
junk <- write_vc(
  iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE,
  digits = 6
)
commit(repo, "important analysis", session = TRUE)
list.files(repo_path)
Sys.sleep(1.1) # required because git doesn't handle subsecond timings

# write and commit a second dataframe
junk <- write_vc(
  iris[7:12, ], "iris2", repo, sorting = "Sepal.Length", stage = TRUE,
  digits = 6
)
commit(repo, "important analysis", session = TRUE)
list.files(repo_path)
Sys.sleep(1.1) # required because git doesn't handle subsecond timings

# write and commit a new version of the first dataframe
junk <- write_vc(iris[7:12, ], "iris", repo, stage = TRUE)
list.files(repo_path)
commit(repo, "important analysis", session = TRUE)



# find out in which commit a file was last changed

# "iris.tsv" was last updated in the third commit
recent_commit("iris.tsv", repo)
# "iris.yml" was last updated in the first commit
recent_commit("iris.yml", repo)
# "iris2.yml" was last updated in the second commit
recent_commit("iris2.yml", repo)
# the git2rdata object "iris" was last updated in the third commit
recent_commit("iris", repo, data = TRUE)

# remove a dataframe and commit it to see what happens with deleted files
file.remove(file.path(repo_path, "iris.tsv"))
prune_meta(repo, ".")
commit(repo, message = "remove iris", all = TRUE, session = TRUE)
list.files(repo_path)

# still points to the third commit as this is the latest commit in which the
# data was present
recent_commit("iris", repo, data = TRUE)
# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# write and commit a first dataframe
# store the output of write_vc() minimize screen output
junk <- write_vc(
  iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE,
  digits = 6
)
commit(repo, "important analysis", session = TRUE)
list.files(repo_path)
Sys.sleep(1.1) # required because git doesn't handle subsecond timings

# write and commit a second dataframe
junk <- write_vc(
  iris[7:12, ], "iris2", repo, sorting = "Sepal.Length", stage = TRUE,
  digits = 6
)
commit(repo, "important analysis", session = TRUE)
list.files(repo_path)
Sys.sleep(1.1) # required because git doesn't handle subsecond timings

# write and commit a new version of the first dataframe
junk <- write_vc(iris[7:12, ], "iris", repo, stage = TRUE)
list.files(repo_path)
commit(repo, "important analysis", session = TRUE)



# find out in which commit a file was last changed

# "iris.tsv" was last updated in the third commit
recent_commit("iris.tsv", repo)
# "iris.yml" was last updated in the first commit
recent_commit("iris.yml", repo)
# "iris2.yml" was last updated in the second commit
recent_commit("iris2.yml", repo)
# the git2rdata object "iris" was last updated in the third commit
recent_commit("iris", repo, data = TRUE)

# remove a dataframe and commit it to see what happens with deleted files
file.remove(file.path(repo_path, "iris.tsv"))
prune_meta(repo, ".")
commit(repo, message = "remove iris", all = TRUE, session = TRUE)
list.files(repo_path)

# still points to the third commit as this is the latest commit in which the
# data was present
recent_commit("iris", repo, data = TRUE)

Relabel Factor Levels by Updating the Metadata

Description

Imagine the situation where we have a dataframe with a factor variable and we have stored it with write_vc(optimize = TRUE). The raw data file contains the factor indices and the metadata contains the link between the factor index and the corresponding label. See vignette("version_control", package = "git2rdata"). In such a case, relabelling a factor can be fast and lightweight by updating the metadata.

Usage

relabel(file, root = ".", change)
relabel(file, root = ".", change)

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`change`	either a `list` or a `data.frame`. In case of a `list` is a named `list` with named `vectors`. The names of list elements must match the names of the variables. The names of the vector elements must match the existing factor labels. The values represent the new factor labels. In case of a `data.frame` it needs to have the variables `factor` (name of the factor), `old` (the old) factor label and `new` (the new factor label). `relabel()` ignores all other columns.

Value

invisible NULL.

Examples


# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# Create a dataframe and store it as an optimized git2rdata object.
# Note that write_vc() uses optimization by default.
# Stage and commit the git2rdata object.
ds <- data.frame(
  a = c("a1", "a2"),
  b = c("b2", "b1"),
  stringsAsFactors = TRUE
)
junk <- write_vc(ds, "relabel", repo, sorting = "b", stage = TRUE)
cm <- commit(repo, "initial commit")
# check that the workspace is clean
status(repo)

# Define new labels as a list and apply them to the git2rdata object.
new_labels <- list(
  a = list(a2 = "a3")
)
relabel("relabel", repo, new_labels)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)
git2r::add(repo, "relabel.*")
cm <- commit(repo, "relabel using a list")

# Define new labels as a dataframe and apply them to the git2rdata object
change <- data.frame(
  factor = c("a", "a", "b"),
  old = c("a3", "a1", "b2"),
  new = c("c2", "c1", "b3"),
  stringsAsFactors = TRUE
)
relabel("relabel", repo, change)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)
# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# Create a dataframe and store it as an optimized git2rdata object.
# Note that write_vc() uses optimization by default.
# Stage and commit the git2rdata object.
ds <- data.frame(
  a = c("a1", "a2"),
  b = c("b2", "b1"),
  stringsAsFactors = TRUE
)
junk <- write_vc(ds, "relabel", repo, sorting = "b", stage = TRUE)
cm <- commit(repo, "initial commit")
# check that the workspace is clean
status(repo)

# Define new labels as a list and apply them to the git2rdata object.
new_labels <- list(
  a = list(a2 = "a3")
)
relabel("relabel", repo, new_labels)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)
git2r::add(repo, "relabel.*")
cm <- commit(repo, "relabel using a list")

# Define new labels as a dataframe and apply them to the git2rdata object
change <- data.frame(
  factor = c("a", "a", "b"),
  old = c("a3", "a1", "b2"),
  new = c("c2", "c1", "b3"),
  stringsAsFactors = TRUE
)
relabel("relabel", repo, change)
# check the changes
read_vc("relabel", repo)
# relabel() changed the metadata, not the raw data
status(repo)

Rename a Variable

Description

The raw data file contains a header with the variable names. The metadata list the variable names and their type. Changing a variable name and overwriting the git2rdata object with result in an error. Because it will look like removing an existing variable and adding a new one. Overwriting the object with strict = FALSE potentially changes the order of the variables, leading to a large diff.

Usage

rename_variable(file, change, root = ".", ...)

## S3 method for class 'character'
rename_variable(file, change, root = ".", ...)

## Default S3 method:
rename_variable(file, change, root, ...)

## S3 method for class 'git_repository'
rename_variable(file, change, root, ..., stage = FALSE, force = FALSE)
rename_variable(file, change, root = ".", ...)

## S3 method for class 'character'
rename_variable(file, change, root = ".", ...)

## Default S3 method:
rename_variable(file, change, root, ...)

## S3 method for class 'git_repository'
rename_variable(file, change, root, ..., stage = FALSE, force = FALSE)

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`change`	A named vector with the old names as values and the new names as names.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`...`	parameters used in some methods
`stage`	Logical value indicating whether to stage the changes after writing the data. Defaults to `FALSE`.
`force`	Add ignored files. Default is FALSE.

Details

This function solves this by only updating the raw data header and the metadata.

Value

invisible NULL.

Examples


# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# Create a dataframe and store it as an optimized git2rdata object.
# Note that write_vc() uses optimization by default.
# Stage and commit the git2rdata object.
ds <- data.frame(
  a = c("a1", "a2"),
  b = c("b2", "b1"),
  stringsAsFactors = TRUE
)
junk <- write_vc(ds, "rename", repo, sorting = "b", stage = TRUE)
cm <- commit(repo, "initial commit")
# check that the workspace is clean
status(repo)

# Define change.
change <- c(new_name = "a")
rename_variable(file = "rename", change = change, root = repo)
# check the changes
read_vc("rename", repo)
status(repo)
# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# Create a dataframe and store it as an optimized git2rdata object.
# Note that write_vc() uses optimization by default.
# Stage and commit the git2rdata object.
ds <- data.frame(
  a = c("a1", "a2"),
  b = c("b2", "b1"),
  stringsAsFactors = TRUE
)
junk <- write_vc(ds, "rename", repo, sorting = "b", stage = TRUE)
cm <- commit(repo, "initial commit")
# check that the workspace is clean
status(repo)

# Define change.
change <- c(new_name = "a")
rename_variable(file = "rename", change = change, root = repo)
# check the changes
read_vc("rename", repo)
status(repo)

Re-exported Function From `git2r`

Description

See repository in git2r.

Remove Data Files From Git2rdata Objects

Description

Remove the data (.tsv) file from all valid git2rdata objects at the path. The metadata remains untouched. A warning lists any git2rdata object with invalid metadata. The function keeps any .tsv file with invalid metadata or from non-git2rdata objects.

Use this function with caution since it will remove all valid data files without asking for confirmation. We strongly recommend to use this function on files under version control. See vignette("workflow", package = "git2rdata") for some examples on how to use this.

Usage

rm_data(root = ".", path = NULL, recursive = TRUE, ...)

## S3 method for class 'git_repository'
rm_data(
  root,
  path = NULL,
  recursive = TRUE,
  ...,
  stage = FALSE,
  type = c("unmodified", "modified", "ignored", "all")
)
rm_data(root = ".", path = NULL, recursive = TRUE, ...)

## S3 method for class 'git_repository'
rm_data(
  root,
  path = NULL,
  recursive = TRUE,
  ...,
  stage = FALSE,
  type = c("unmodified", "modified", "ignored", "all")
)

Arguments

`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`path`	the directory in which to clean all the data files. The directory is relative to `root`.
`recursive`	remove files in subdirectories too.
`...`	parameters used in some methods
`stage`	stage the changes after removing the files. Defaults to FALSE.
`type`	Defines the classes of files to remove. `unmodified` are files in the git history and unchanged since the last commit. `modified` are files in the git history and changed since the last commit. `ignored` refers to file listed in a `.gitignore` file. Selecting `modified` will remove both `unmodified` and `modified` data files. Selecting `ìgnored` will remove `unmodified`, `modified` and `ignored` data files. `all` refers to all visible data files, including `untracked` files.

Value

returns invisibly a vector of removed files names. The paths are relative to root.

Examples

## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a dataframe as git2rdata object. Capture the result to minimise
# screen output
junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# write a standard tab separate file (non git2rdata object)
write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t")
# write a YAML file
yml <- list(
  authors = list(
   "Research Institute for Nature and Forest" = list(
       href = "https://www.inbo.be/en")))
yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml"))

# list the git2rdata objects
list_data(root)
# list the files
list.files(root, recursive = TRUE)

# remove all .tsv files from valid git2rdata objects
rm_data(root, path = ".")
# check the removal of the .tsv file
list.files(root, recursive = TRUE)
list_data(root)

# remove dangling git2rdata metadata files
prune_meta(root, path = ".")
# check the removal of the metadata
list.files(root, recursive = TRUE)
list_data(root)


## on git repo

# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe
write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE)
# check that the dataframe is stored
status(repo)
list_data(repo)

# commit the current version and check the git repo
commit(repo, "add iris data", session = TRUE)
status(repo)

# remove the data files from the repo
rm_data(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

# remove dangling metadata
prune_meta(repo, path = ".")
# check the removal
list_data(repo)
status(repo)
## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# store a dataframe as git2rdata object. Capture the result to minimise
# screen output
junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length")
# write a standard tab separate file (non git2rdata object)
write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t")
# write a YAML file
yml <- list(
  authors = list(
   "Research Institute for Nature and Forest" = list(
       href = "https://www.inbo.be/en")))
yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml"))

# list the git2rdata objects
list_data(root)
# list the files
list.files(root, recursive = TRUE)

# remove all .tsv files from valid git2rdata objects
rm_data(root, path = ".")
# check the removal of the .tsv file
list.files(root, recursive = TRUE)
list_data(root)

# remove dangling git2rdata metadata files
prune_meta(root, path = ".")
# check the removal of the metadata
list.files(root, recursive = TRUE)
list_data(root)


## on git repo

# initialise a git repo using git2r
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe
write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE)
# check that the dataframe is stored
status(repo)
list_data(repo)

# commit the current version and check the git repo
commit(repo, "add iris data", session = TRUE)
status(repo)

# remove the data files from the repo
rm_data(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

# remove dangling metadata
prune_meta(repo, path = ".")
# check the removal
list_data(repo)
status(repo)

Re-exported Function From `git2r`

Description

See status in git2r.

Summary method for `git2rdata` objects.

Description

Prints the summary of the data and the description of the columns when available.

Usage

## S3 method for class 'git2rdata'
summary(object, ...)
## S3 method for class 'git2rdata'
summary(object, ...)

Arguments

`object`	a `git2rdata` object
`...`	additional arguments passed to `summary`

Update the description of a `git2rdata` object

Description

Allows to update the description of the fields, the table name, the title, and the description of a git2rdata object. All arguments are optional. Setting an argument to NA or an empty string will remove the corresponding field from the metadata.

Usage

update_metadata(
  file,
  root = ".",
  field_description,
  name,
  title,
  description,
  ...
)
update_metadata(
  file,
  root = ".",
  field_description,
  name,
  title,
  description,
  ...
)

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`field_description`	a named character vector with the new descriptions for the fields. The names of the vector must match the variable names.
`name`	a character string with the new table name of the object.
`title`	a character string with the new title of the object.
`description`	a character string with the new description of the object.
`...`	parameters used in some methods

Upgrade Files to the New Version

Description

Updates the data written by older versions to the current data format standard. Works both on a single file and (recursively) on a path. The ".yml" file must contain a "..generic" element. upgrade_data() ignores all other files.

Usage

upgrade_data(file, root = ".", verbose, ..., path)

## S3 method for class 'git_repository'
upgrade_data(
  file,
  root = ".",
  verbose = TRUE,
  ...,
  path,
  stage = FALSE,
  force = FALSE
)
upgrade_data(file, root = ".", verbose, ..., path)

## S3 method for class 'git_repository'
upgrade_data(
  file,
  root = ".",
  verbose = TRUE,
  ...,
  path,
  stage = FALSE,
  force = FALSE
)

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`verbose`	display a message with the update status. Defaults to `TRUE`.
`...`	parameters used in some methods
`path`	specify `path` instead of `file` to update all git2rdata objects in this directory and it's subdirectories. `path` is relative to `root`. Use `path = "."` to upgrade all git2rdata objects under `root`.
`stage`	Logical value indicating whether to stage the changes after writing the data. Defaults to `FALSE`.
`force`	Add ignored files. Default is FALSE.

Value

the git2rdata object names.

Examples

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# write dataframes to the root
write_vc(
  iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
write_vc(
  iris[5:10, ], file = "subdir/iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
# upgrade a single git2rdata object
upgrade_data(file = "iris", root = root)
# use path = "." to upgrade all git2rdata objects under root
upgrade_data(path = ".", root = root)
# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# write dataframes to the root
write_vc(
  iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
write_vc(
  iris[5:10, ], file = "subdir/iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
# upgrade a single git2rdata object
upgrade_data(file = "iris", root = root)
# use path = "." to upgrade all git2rdata objects under root
upgrade_data(path = ".", root = root)

Read a file an verify the presence of variables

Description

Reads the file with read_vc(). Then verifies that every variable listed in variables is present in the data.frame.

Usage

verify_vc(file, root, variables)
verify_vc(file, root, variables)

Arguments

`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`variables`	a character vector with variable names.

Store a Data.Frame as a Git2rdata Object on Disk

Description

A git2rdata object consists of two files. The ".tsv" file contains the raw data as a plain text tab separated file. The ".yml" contains the metadata on the columns in plain text YAML format. See vignette("plain text", package = "git2rdata") for more details on the implementation.

Usage

write_vc(
  x,
  file,
  root = ".",
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  split_by
)

## S3 method for class 'character'
write_vc(
  x,
  file,
  root = ".",
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  append = FALSE,
  split_by = character(0),
  digits
)

## S3 method for class 'git_repository'
write_vc(
  x,
  file,
  root,
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  stage = FALSE,
  force = FALSE
)
write_vc(
  x,
  file,
  root = ".",
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  split_by
)

## S3 method for class 'character'
write_vc(
  x,
  file,
  root = ".",
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  append = FALSE,
  split_by = character(0),
  digits
)

## S3 method for class 'git_repository'
write_vc(
  x,
  file,
  root,
  sorting,
  strict = TRUE,
  optimize = TRUE,
  na = "NA",
  ...,
  stage = FALSE,
  force = FALSE
)

Arguments

`x`	the `data.frame`.
`file`	the name of the git2rdata object. Git2rdata objects cannot have dots in their name. The name may include a relative path. `file` is a path relative to the `root`. Note that `file` must point to a location within `root`.
`root`	The root of a project. Can be a file path or a `git-repository`. Defaults to the current working directory (`"."`).
`sorting`	an optional vector of column names defining which columns to use for sorting `x` and in what order to use them. The default empty `sorting` yields a warning. Add `sorting` to avoid this warning. Strongly recommended in combination with version control. See `vignette("efficiency", package = "git2rdata")` for an illustration of the importance of sorting.
`strict`	What to do when the metadata changes. `strict = FALSE` overwrites the data and the metadata with a warning listing the changes, `strict = TRUE` returns an error and leaves the data and metadata as is. Defaults to `TRUE`.
`optimize`	If `TRUE`, recode the data to get smaller text files. If `FALSE`, `meta()` converts the data to character. Defaults to `TRUE`.
`na`	the string to use for missing values in the data.
`...`	parameters used in some methods
`split_by`	An optional vector of variables name to split the text files. This creates a separate file for every combination. We prepend these variables to the vector of `sorting` variables.
`append`	logical. Only relevant if `file` is a character string. If `TRUE`, the output is appended to the file. If `FALSE`, any existing file of the name is destroyed.
`digits`	The number of significant digits of the smallest absolute value. The function applies the rounding automatically. Only relevant for numeric variables. Either a single positive integer or a named vector where the names link to the variables in the `data.frame`. Defaults to `6` with a warning.
`stage`	Logical value indicating whether to stage the changes after writing the data. Defaults to `FALSE`.
`force`	Add ignored files. Default is FALSE.

Value

a named vector with the file paths relative to root. The names contain the hashes of the files.

Note

..generic is a reserved name for the metadata and is a forbidden column name in a data.frame.

Examples

## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# write a dataframe to the directory
write_vc(
  iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
# check that a data file (.tsv) and a metadata file (.yml) exist.
list.files(root, recursive = TRUE)
# read the git2rdata object from the directory
read_vc("iris", root)

# store a new version with different observations but the same metadata
write_vc(iris[1:5, ], "iris", root)
list.files(root, recursive = TRUE)
# Removing a column requires version requires new metadata.
# Add strict = FALSE to override the existing metadata.
write_vc(
  iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE
)
list.files(root, recursive = TRUE)
# storing the orignal version again requires another update of the metadata
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE)
list.files(root, recursive = TRUE)
# optimize = FALSE stores the data more verbose. This requires larger files.
write_vc(
  iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE
)
list.files(root, recursive = TRUE)



## on git repo using a git2r::git-repository

# initialise a git repo using the git2r package
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe in git repo.
write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length")
# This git2rdata object is not staged by default.
status(repo)
# read a dataframe from a git repo
read_vc("iris", repo)

# store a new version in the git repo and stage it in one go
write_vc(iris[1:5, ], "iris", repo, stage = TRUE)
status(repo)

# store a verbose version in a different gir2data object
write_vc(
  iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE
)
status(repo)
## on file system

# create a directory
root <- tempfile("git2rdata-")
dir.create(root)

# write a dataframe to the directory
write_vc(
  iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length",
  digits = 6
)
# check that a data file (.tsv) and a metadata file (.yml) exist.
list.files(root, recursive = TRUE)
# read the git2rdata object from the directory
read_vc("iris", root)

# store a new version with different observations but the same metadata
write_vc(iris[1:5, ], "iris", root)
list.files(root, recursive = TRUE)
# Removing a column requires version requires new metadata.
# Add strict = FALSE to override the existing metadata.
write_vc(
  iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE
)
list.files(root, recursive = TRUE)
# storing the orignal version again requires another update of the metadata
write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE)
list.files(root, recursive = TRUE)
# optimize = FALSE stores the data more verbose. This requires larger files.
write_vc(
  iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE
)
list.files(root, recursive = TRUE)



## on git repo using a git2r::git-repository

# initialise a git repo using the git2r package
repo_path <- tempfile("git2rdata-repo-")
dir.create(repo_path)
repo <- git2r::init(repo_path)
git2r::config(repo, user.name = "Alice", user.email = "[email protected]")

# store a dataframe in git repo.
write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length")
# This git2rdata object is not staged by default.
status(repo)
# read a dataframe from a git repo
read_vc("iris", repo)

# store a new version in the git repo and stage it in one go
write_vc(iris[1:5, ], "iris", repo, stage = TRUE)
status(repo)

# store a verbose version in a different gir2data object
write_vc(
  iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE
)
status(repo)

Package 'git2rdata'

Help Index

Re-exported Function From git2r

Description

See Also

Create a Data Package for a directory of CSV files

Description

Usage

Arguments

See Also

Display metadata for a git2rdata object

Description

Usage

Arguments

See Also

Check Whether a Git2rdata Object is Valid.

Description

Usage

Arguments

Value

See Also

Examples

Check Whether a Git2rdata Object Has Valid Metadata.

Description

Usage

Arguments

Value

See Also

Examples

List Available Git2rdata Files Containing Data

Description

Usage

Arguments

Value

See Also

Examples

Optimize an Object for Storage as Plain Text and Add Metadata

Description

Usage

Arguments

Value

Note

See Also

Examples

Print method for git2rdata objects.

Description

Usage

Arguments

See Also

Prune Metadata Files

Description

Usage

Arguments

Value

See Also

Examples

Re-exported Function From git2r

Description

See Also

Re-exported Function From git2r

Description

See Also

Read a Git2rdata Object from Disk

Description

Usage

Arguments

Value

See Also

Examples

Retrieve the Most Recent File Change

Description

Usage

Arguments

Value

See Also

Examples

Relabel Factor Levels by Updating the Metadata

Description

Usage

Arguments

Re-exported Function From `git2r`

Display metadata for a `git2rdata` object

Print method for `git2rdata` objects.

Re-exported Function From `git2r`

Re-exported Function From `git2r`

Re-exported Function From `git2r`

Re-exported Function From `git2r`

Summary method for `git2rdata` objects.

Update the description of a `git2rdata` object