Title: | Store and Retrieve Data.frames in a Git Repository |
---|---|
Description: | The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette("plain_text", package = "git2rdata"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette("version_control", package = "git2rdata"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette("workflow", package = "git2rdata") gives a toy example. 4) vignette("efficiency", package = "git2rdata") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading. |
Authors: | Thierry Onkelinx [aut, cre] (<https://orcid.org/0000-0001-8804-4216>, Research Institute for Nature and Forest (INBO)), Floris Vanderhaeghe [ctb] (<https://orcid.org/0000-0002-6378-6229>, Research Institute for Nature and Forest (INBO)), Peter Desmet [ctb] (<https://orcid.org/0000-0002-8442-8025>, Research Institute for Nature and Forest (INBO)), Els Lommelen [ctb] (<https://orcid.org/0000-0002-3481-5684>, Research Institute for Nature and Forest (INBO)), Research Institute for Nature and Forest (INBO) [cph, fnd] |
Maintainer: | Thierry Onkelinx <[email protected]> |
License: | GPL-3 |
Version: | 0.4.1 |
Built: | 2024-12-05 05:58:25 UTC |
Source: | https://github.com/ropensci/git2rdata |
git2r
See commit
in git2r
.
Other version_control:
pull()
,
push()
,
recent_commit()
,
repository()
,
status()
git2rdata
objectDisplay metadata for a git2rdata
object
display_metadata(x, minimal = FALSE)
display_metadata(x, minimal = FALSE)
x |
a |
minimal |
logical, if |
Other storage:
list_data()
,
prune_meta()
,
read_vc()
,
relabel()
,
rename_variable()
,
rm_data()
,
update_metadata()
,
verify_vc()
,
write_vc()
A valid git2rdata object has valid metadata.
is_git2rdata(file, root = ".", message = c("none", "warning", "error"))
is_git2rdata(file, root = ".", message = c("none", "warning", "error"))
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
message |
a single value indicating the type of messages on top of the
logical value. |
A logical value. TRUE
in case of a valid git2rdata object.
Otherwise FALSE
.
Other internal:
is_git2rmeta()
,
meta()
,
print.git2rdata()
,
summary.git2rdata()
,
upgrade_data()
# create a directory root <- tempfile("git2rdata-") dir.create(root) # store a file write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # check the stored file is_git2rmeta("iris", root) is_git2rdata("iris", root) # Remove the metadata from the existing git2rdata object. Then it stops # being a git2rdata object. junk <- file.remove(file.path(root, "iris.yml")) is_git2rmeta("iris", root) is_git2rdata("iris", root) # recreate the file and remove the data and keep the metadata. It stops being # a git2rdata object, but the metadata remains valid. write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") junk <- file.remove(file.path(root, "iris.tsv")) is_git2rmeta("iris", root) is_git2rdata("iris", root)
# create a directory root <- tempfile("git2rdata-") dir.create(root) # store a file write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # check the stored file is_git2rmeta("iris", root) is_git2rdata("iris", root) # Remove the metadata from the existing git2rdata object. Then it stops # being a git2rdata object. junk <- file.remove(file.path(root, "iris.yml")) is_git2rmeta("iris", root) is_git2rdata("iris", root) # recreate the file and remove the data and keep the metadata. It stops being # a git2rdata object, but the metadata remains valid. write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") junk <- file.remove(file.path(root, "iris.tsv")) is_git2rmeta("iris", root) is_git2rdata("iris", root)
Valid metadata is a file with .yml
extension. It has a top level item
..generic
. This item contains git2rdata
(the version number), hash
(a
hash on the metadata) and data_hash
(a hash on the data file). The version
number must be the current version.
is_git2rmeta(file, root = ".", message = c("none", "warning", "error"))
is_git2rmeta(file, root = ".", message = c("none", "warning", "error"))
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
message |
a single value indicating the type of messages on top of the
logical value. |
A logical value. TRUE
in case of a valid metadata file. Otherwise
FALSE
.
Other internal:
is_git2rdata()
,
meta()
,
print.git2rdata()
,
summary.git2rdata()
,
upgrade_data()
# create a directory root <- tempfile("git2rdata-") dir.create(root) # store a file write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # check the stored file is_git2rmeta("iris", root) is_git2rdata("iris", root) # Remove the metadata from the existing git2rdata object. Then it stops # being a git2rdata object. junk <- file.remove(file.path(root, "iris.yml")) is_git2rmeta("iris", root) is_git2rdata("iris", root) # recreate the file and remove the data and keep the metadata. It stops being # a git2rdata object, but the metadata remains valid. write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") junk <- file.remove(file.path(root, "iris.tsv")) is_git2rmeta("iris", root) is_git2rdata("iris", root)
# create a directory root <- tempfile("git2rdata-") dir.create(root) # store a file write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # check the stored file is_git2rmeta("iris", root) is_git2rdata("iris", root) # Remove the metadata from the existing git2rdata object. Then it stops # being a git2rdata object. junk <- file.remove(file.path(root, "iris.yml")) is_git2rmeta("iris", root) is_git2rdata("iris", root) # recreate the file and remove the data and keep the metadata. It stops being # a git2rdata object, but the metadata remains valid. write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") junk <- file.remove(file.path(root, "iris.tsv")) is_git2rmeta("iris", root) is_git2rdata("iris", root)
The function returns the names of all valid git2rdata objects. This implies
.tsv
files with a matching valid metadata file (.yml
). Invalid
metadata files result in a warning. The function ignores valid metadata
files without matching raw data (.tsv
).
list_data(root = ".", path = ".", recursive = TRUE)
list_data(root = ".", path = ".", recursive = TRUE)
root |
the |
path |
relative |
recursive |
logical. Should the listing recurse into directories? |
A character vector of git2rdata object names, including their relative path.
Other storage:
display_metadata()
,
prune_meta()
,
read_vc()
,
relabel()
,
rename_variable()
,
rm_data()
,
update_metadata()
,
verify_vc()
,
write_vc()
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # store a dataframe as git2rdata object. Capture the result to minimise # screen output junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # write a standard tab separate file (non git2rdata object) write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t") # write a YAML file yml <- list( authors = list( "Research Institute for Nature and Forest" = list( href = "https://www.inbo.be/en"))) yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml")) # list the git2rdata objects list_data(root) # list the files list.files(root, recursive = TRUE) # remove all .tsv files from valid git2rdata objects rm_data(root, path = ".") # check the removal of the .tsv file list.files(root, recursive = TRUE) list_data(root) # remove dangling git2rdata metadata files prune_meta(root, path = ".") # check the removal of the metadata list.files(root, recursive = TRUE) list_data(root) ## on git repo # initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) # check that the dataframe is stored status(repo) list_data(repo) # commit the current version and check the git repo commit(repo, "add iris data", session = TRUE) status(repo) # remove the data files from the repo rm_data(repo, path = ".") # check the removal list_data(repo) status(repo) # remove dangling metadata prune_meta(repo, path = ".") # check the removal list_data(repo) status(repo)
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # store a dataframe as git2rdata object. Capture the result to minimise # screen output junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # write a standard tab separate file (non git2rdata object) write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t") # write a YAML file yml <- list( authors = list( "Research Institute for Nature and Forest" = list( href = "https://www.inbo.be/en"))) yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml")) # list the git2rdata objects list_data(root) # list the files list.files(root, recursive = TRUE) # remove all .tsv files from valid git2rdata objects rm_data(root, path = ".") # check the removal of the .tsv file list.files(root, recursive = TRUE) list_data(root) # remove dangling git2rdata metadata files prune_meta(root, path = ".") # check the removal of the metadata list.files(root, recursive = TRUE) list_data(root) ## on git repo # initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) # check that the dataframe is stored status(repo) list_data(repo) # commit the current version and check the git repo commit(repo, "add iris data", session = TRUE) status(repo) # remove the data files from the repo rm_data(repo, path = ".") # check the removal list_data(repo) status(repo) # remove dangling metadata prune_meta(repo, path = ".") # check the removal list_data(repo) status(repo)
Prepares a vector for storage. When relevant, meta()
optimizes the object
for storage by changing the format to one which needs less characters. The
metadata stored in the meta
attribute, contains all required information to
back-transform the optimized format into the original format.
In case of a data.frame, meta()
applies itself to each of the columns. The
meta
attribute becomes a named list containing the metadata for each column
plus an additional ..generic
element. ..generic
is a reserved name for
the metadata and not allowed as column name in a data.frame
.
write_vc()
uses this function to prepare a dataframe for storage.
Existing metadata is passed through the optional old
argument. This
argument intended for internal use.
meta(x, ...) ## S3 method for class 'character' meta(x, na = "NA", optimize = TRUE, ...) ## S3 method for class 'factor' meta(x, optimize = TRUE, na = "NA", index, strict = TRUE, ...) ## S3 method for class 'logical' meta(x, optimize = TRUE, ...) ## S3 method for class 'POSIXct' meta(x, optimize = TRUE, ...) ## S3 method for class 'Date' meta(x, optimize = TRUE, ...) ## S3 method for class 'data.frame' meta( x, optimize = TRUE, na = "NA", sorting, strict = TRUE, split_by = character(0), ... )
meta(x, ...) ## S3 method for class 'character' meta(x, na = "NA", optimize = TRUE, ...) ## S3 method for class 'factor' meta(x, optimize = TRUE, na = "NA", index, strict = TRUE, ...) ## S3 method for class 'logical' meta(x, optimize = TRUE, ...) ## S3 method for class 'POSIXct' meta(x, optimize = TRUE, ...) ## S3 method for class 'Date' meta(x, optimize = TRUE, ...) ## S3 method for class 'data.frame' meta( x, optimize = TRUE, na = "NA", sorting, strict = TRUE, split_by = character(0), ... )
x |
the vector. |
... |
further arguments to the methods. |
na |
the string to use for missing values in the data. |
optimize |
If |
index |
An optional named vector with existing factor indices.
The names must match the existing factor levels.
Unmatched levels from |
strict |
What to do when the metadata changes. |
sorting |
an optional vector of column names defining which columns to
use for sorting |
split_by |
An optional vector of variables name to split the text files.
This creates a separate file for every combination.
We prepend these variables to the vector of |
the optimized vector x
with meta
attribute.
The default order of factor levels depends on the current locale.
See Comparison
for more details on that.
The same code on a different locale might result in a different sorting.
meta()
ignores, with a warning, any change in the order of factor levels.
Add strict = FALSE
to enforce the new order of factor levels.
Other internal:
is_git2rdata()
,
is_git2rmeta()
,
print.git2rdata()
,
summary.git2rdata()
,
upgrade_data()
meta(c(NA, "'NA'", '"NA"', "abc\tdef", "abc\ndef")) meta(1:3) meta(seq(1, 3, length = 4)) meta(factor(c("b", NA, "NA"), levels = c("NA", "b", "c"))) meta(factor(c("b", NA, "a"), levels = c("a", "b", "c")), optimize = FALSE) meta(factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE)) meta( factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE), optimize = FALSE ) meta(c(FALSE, NA, TRUE)) meta(c(FALSE, NA, TRUE), optimize = FALSE) meta(complex(real = c(1, NA, 2), imaginary = c(3, NA, -1))) meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET")) meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET"), optimize = FALSE) meta(as.Date("2019-02-01")) meta(as.Date("2019-02-01"), optimize = FALSE)
meta(c(NA, "'NA'", '"NA"', "abc\tdef", "abc\ndef")) meta(1:3) meta(seq(1, 3, length = 4)) meta(factor(c("b", NA, "NA"), levels = c("NA", "b", "c"))) meta(factor(c("b", NA, "a"), levels = c("a", "b", "c")), optimize = FALSE) meta(factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE)) meta( factor(c("b", NA, "a"), levels = c("a", "b", "c"), ordered = TRUE), optimize = FALSE ) meta(c(FALSE, NA, TRUE)) meta(c(FALSE, NA, TRUE), optimize = FALSE) meta(complex(real = c(1, NA, 2), imaginary = c(3, NA, -1))) meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET")) meta(as.POSIXct("2019-02-01 10:59:59", tz = "CET"), optimize = FALSE) meta(as.Date("2019-02-01")) meta(as.Date("2019-02-01"), optimize = FALSE)
git2rdata
objects.Prints the data and the description of the columns when available.
## S3 method for class 'git2rdata' print(x, ...)
## S3 method for class 'git2rdata' print(x, ...)
x |
a |
... |
additional arguments passed to |
Other internal:
is_git2rdata()
,
is_git2rmeta()
,
meta()
,
summary.git2rdata()
,
upgrade_data()
Removes all valid metadata (.yml
files) from the path
when they don't
have accompanying data (.tsv
file). Invalid metadata triggers a warning
without removing the metadata file.
Use this function with caution since it will remove all valid metadata files
without asking for confirmation. We strongly recommend to use this
function on files under version control. See
vignette("workflow", package = "git2rdata")
for some examples on how to use
this.
prune_meta(root = ".", path = NULL, recursive = TRUE, ...) ## S3 method for class 'git_repository' prune_meta(root, path = NULL, recursive = TRUE, ..., stage = FALSE)
prune_meta(root = ".", path = NULL, recursive = TRUE, ...) ## S3 method for class 'git_repository' prune_meta(root, path = NULL, recursive = TRUE, ..., stage = FALSE)
root |
The root of a project. Can be a file path or a |
path |
the directory in which to clean all the data files. The directory
is relative to |
recursive |
remove files in subdirectories too. |
... |
parameters used in some methods |
stage |
stage the changes after removing the files. Defaults to |
returns invisibly a vector of removed files names. The paths are
relative to root
.
Other storage:
display_metadata()
,
list_data()
,
read_vc()
,
relabel()
,
rename_variable()
,
rm_data()
,
update_metadata()
,
verify_vc()
,
write_vc()
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # store a dataframe as git2rdata object. Capture the result to minimise # screen output junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # write a standard tab separate file (non git2rdata object) write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t") # write a YAML file yml <- list( authors = list( "Research Institute for Nature and Forest" = list( href = "https://www.inbo.be/en"))) yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml")) # list the git2rdata objects list_data(root) # list the files list.files(root, recursive = TRUE) # remove all .tsv files from valid git2rdata objects rm_data(root, path = ".") # check the removal of the .tsv file list.files(root, recursive = TRUE) list_data(root) # remove dangling git2rdata metadata files prune_meta(root, path = ".") # check the removal of the metadata list.files(root, recursive = TRUE) list_data(root) ## on git repo # initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) # check that the dataframe is stored status(repo) list_data(repo) # commit the current version and check the git repo commit(repo, "add iris data", session = TRUE) status(repo) # remove the data files from the repo rm_data(repo, path = ".") # check the removal list_data(repo) status(repo) # remove dangling metadata prune_meta(repo, path = ".") # check the removal list_data(repo) status(repo)
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # store a dataframe as git2rdata object. Capture the result to minimise # screen output junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # write a standard tab separate file (non git2rdata object) write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t") # write a YAML file yml <- list( authors = list( "Research Institute for Nature and Forest" = list( href = "https://www.inbo.be/en"))) yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml")) # list the git2rdata objects list_data(root) # list the files list.files(root, recursive = TRUE) # remove all .tsv files from valid git2rdata objects rm_data(root, path = ".") # check the removal of the .tsv file list.files(root, recursive = TRUE) list_data(root) # remove dangling git2rdata metadata files prune_meta(root, path = ".") # check the removal of the metadata list.files(root, recursive = TRUE) list_data(root) ## on git repo # initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) # check that the dataframe is stored status(repo) list_data(repo) # commit the current version and check the git repo commit(repo, "add iris data", session = TRUE) status(repo) # remove the data files from the repo rm_data(repo, path = ".") # check the removal list_data(repo) status(repo) # remove dangling metadata prune_meta(repo, path = ".") # check the removal list_data(repo) status(repo)
git2r
See pull
in git2r
.
Other version_control:
commit()
,
push()
,
recent_commit()
,
repository()
,
status()
git2r
See push
in git2r
.
Other version_control:
commit()
,
pull()
,
recent_commit()
,
repository()
,
status()
read_vc()
handles git2rdata objects stored by write_vc()
. It reads and
verifies the metadata file (.yml
). Then it reads and verifies the raw data.
The last step is back-transforming any transformation done by meta()
to
return the data.frame
as stored by write_vc()
.
read_vc()
is an S3 generic on root
which currently handles "character"
(a path) and "git-repository"
(from git2r
). S3 methods for other version
control system could be added.
read_vc(file, root = ".")
read_vc(file, root = ".")
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
The data.frame
with the file names and hashes as attributes.
It has the additional class "git2rdata"
to support extra methods to
display the descriptions.
Other storage:
display_metadata()
,
list_data()
,
prune_meta()
,
relabel()
,
rename_variable()
,
rm_data()
,
update_metadata()
,
verify_vc()
,
write_vc()
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # write a dataframe to the directory write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length") # check that a data file (.tsv) and a metadata file (.yml) exist. list.files(root, recursive = TRUE) # read the git2rdata object from the directory read_vc("iris", root) # store a new version with different observations but the same metadata write_vc(iris[1:5, ], "iris", root) list.files(root, recursive = TRUE) # Removing a column requires version requires new metadata. # Add strict = FALSE to override the existing metadata. write_vc( iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE ) list.files(root, recursive = TRUE) # storing the orignal version again requires another update of the metadata write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE) list.files(root, recursive = TRUE) # optimize = FALSE stores the data more verbose. This requires larger files. write_vc( iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE ) list.files(root, recursive = TRUE) ## on git repo using a git2r::git-repository # initialise a git repo using the git2r package repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe in git repo. write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length") # This git2rdata object is not staged by default. status(repo) # read a dataframe from a git repo read_vc("iris", repo) # store a new version in the git repo and stage it in one go write_vc(iris[1:5, ], "iris", repo, stage = TRUE) status(repo) # store a verbose version in a different gir2data object write_vc( iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE ) status(repo)
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # write a dataframe to the directory write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length") # check that a data file (.tsv) and a metadata file (.yml) exist. list.files(root, recursive = TRUE) # read the git2rdata object from the directory read_vc("iris", root) # store a new version with different observations but the same metadata write_vc(iris[1:5, ], "iris", root) list.files(root, recursive = TRUE) # Removing a column requires version requires new metadata. # Add strict = FALSE to override the existing metadata. write_vc( iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE ) list.files(root, recursive = TRUE) # storing the orignal version again requires another update of the metadata write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE) list.files(root, recursive = TRUE) # optimize = FALSE stores the data more verbose. This requires larger files. write_vc( iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE ) list.files(root, recursive = TRUE) ## on git repo using a git2r::git-repository # initialise a git repo using the git2r package repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe in git repo. write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length") # This git2rdata object is not staged by default. status(repo) # read a dataframe from a git repo read_vc("iris", repo) # store a new version in the git repo and stage it in one go write_vc(iris[1:5, ], "iris", repo, stage = TRUE) status(repo) # store a verbose version in a different gir2data object write_vc( iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE ) status(repo)
Retrieve the most recent commit that added or updated a file or git2rdata object. This does not imply that file still exists at the current HEAD as it ignores the deletion of files.
Use this information to document the current version of file or git2rdata
object in an analysis. Since it refers to the most recent change of this
file, it remains unchanged by committing changes to other files. You can
also use it to track if data got updated, requiring an analysis to
be rerun. See vignette("workflow", package = "git2rdata")
.
recent_commit(file, root, data = FALSE)
recent_commit(file, root, data = FALSE)
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
data |
does |
a data.frame
with commit
, author
and when
for the most recent
commit that adds op updates the file.
Other version_control:
commit()
,
pull()
,
push()
,
repository()
,
status()
# initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # write and commit a first dataframe # store the output of write_vc() minimize screen output junk <- write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) commit(repo, "important analysis", session = TRUE) list.files(repo_path) Sys.sleep(1.1) # required because git doesn't handle subsecond timings # write and commit a second dataframe junk <- write_vc(iris[7:12, ], "iris2", repo, sorting = "Sepal.Length", stage = TRUE) commit(repo, "important analysis", session = TRUE) list.files(repo_path) Sys.sleep(1.1) # required because git doesn't handle subsecond timings # write and commit a new version of the first dataframe junk <- write_vc(iris[7:12, ], "iris", repo, stage = TRUE) list.files(repo_path) commit(repo, "important analysis", session = TRUE) # find out in which commit a file was last changed # "iris.tsv" was last updated in the third commit recent_commit("iris.tsv", repo) # "iris.yml" was last updated in the first commit recent_commit("iris.yml", repo) # "iris2.yml" was last updated in the second commit recent_commit("iris2.yml", repo) # the git2rdata object "iris" was last updated in the third commit recent_commit("iris", repo, data = TRUE) # remove a dataframe and commit it to see what happens with deleted files file.remove(file.path(repo_path, "iris.tsv")) prune_meta(repo, ".") commit(repo, message = "remove iris", all = TRUE, session = TRUE) list.files(repo_path) # still points to the third commit as this is the latest commit in which the # data was present recent_commit("iris", repo, data = TRUE)
# initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # write and commit a first dataframe # store the output of write_vc() minimize screen output junk <- write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) commit(repo, "important analysis", session = TRUE) list.files(repo_path) Sys.sleep(1.1) # required because git doesn't handle subsecond timings # write and commit a second dataframe junk <- write_vc(iris[7:12, ], "iris2", repo, sorting = "Sepal.Length", stage = TRUE) commit(repo, "important analysis", session = TRUE) list.files(repo_path) Sys.sleep(1.1) # required because git doesn't handle subsecond timings # write and commit a new version of the first dataframe junk <- write_vc(iris[7:12, ], "iris", repo, stage = TRUE) list.files(repo_path) commit(repo, "important analysis", session = TRUE) # find out in which commit a file was last changed # "iris.tsv" was last updated in the third commit recent_commit("iris.tsv", repo) # "iris.yml" was last updated in the first commit recent_commit("iris.yml", repo) # "iris2.yml" was last updated in the second commit recent_commit("iris2.yml", repo) # the git2rdata object "iris" was last updated in the third commit recent_commit("iris", repo, data = TRUE) # remove a dataframe and commit it to see what happens with deleted files file.remove(file.path(repo_path, "iris.tsv")) prune_meta(repo, ".") commit(repo, message = "remove iris", all = TRUE, session = TRUE) list.files(repo_path) # still points to the third commit as this is the latest commit in which the # data was present recent_commit("iris", repo, data = TRUE)
Imagine the situation where we have a dataframe with a factor variable and we
have stored it with write_vc(optimize = TRUE)
. The raw data file contains
the factor indices and the metadata contains the link between the factor
index and the corresponding label. See
vignette("version_control", package = "git2rdata")
. In such a case,
relabelling a factor can be fast and lightweight by updating the metadata.
relabel(file, root = ".", change)
relabel(file, root = ".", change)
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
change |
either a |
invisible NULL
.
Other storage:
display_metadata()
,
list_data()
,
prune_meta()
,
read_vc()
,
rename_variable()
,
rm_data()
,
update_metadata()
,
verify_vc()
,
write_vc()
# initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # Create a dataframe and store it as an optimized git2rdata object. # Note that write_vc() uses optimization by default. # Stage and commit the git2rdata object. ds <- data.frame( a = c("a1", "a2"), b = c("b2", "b1"), stringsAsFactors = TRUE ) junk <- write_vc(ds, "relabel", repo, sorting = "b", stage = TRUE) cm <- commit(repo, "initial commit") # check that the workspace is clean status(repo) # Define new labels as a list and apply them to the git2rdata object. new_labels <- list( a = list(a2 = "a3") ) relabel("relabel", repo, new_labels) # check the changes read_vc("relabel", repo) # relabel() changed the metadata, not the raw data status(repo) git2r::add(repo, "relabel.*") cm <- commit(repo, "relabel using a list") # Define new labels as a dataframe and apply them to the git2rdata object change <- data.frame( factor = c("a", "a", "b"), old = c("a3", "a1", "b2"), new = c("c2", "c1", "b3"), stringsAsFactors = TRUE ) relabel("relabel", repo, change) # check the changes read_vc("relabel", repo) # relabel() changed the metadata, not the raw data status(repo)
# initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # Create a dataframe and store it as an optimized git2rdata object. # Note that write_vc() uses optimization by default. # Stage and commit the git2rdata object. ds <- data.frame( a = c("a1", "a2"), b = c("b2", "b1"), stringsAsFactors = TRUE ) junk <- write_vc(ds, "relabel", repo, sorting = "b", stage = TRUE) cm <- commit(repo, "initial commit") # check that the workspace is clean status(repo) # Define new labels as a list and apply them to the git2rdata object. new_labels <- list( a = list(a2 = "a3") ) relabel("relabel", repo, new_labels) # check the changes read_vc("relabel", repo) # relabel() changed the metadata, not the raw data status(repo) git2r::add(repo, "relabel.*") cm <- commit(repo, "relabel using a list") # Define new labels as a dataframe and apply them to the git2rdata object change <- data.frame( factor = c("a", "a", "b"), old = c("a3", "a1", "b2"), new = c("c2", "c1", "b3"), stringsAsFactors = TRUE ) relabel("relabel", repo, change) # check the changes read_vc("relabel", repo) # relabel() changed the metadata, not the raw data status(repo)
The raw data file contains a header with the variable names.
The metadata list the variable names and their type.
Changing a variable name and overwriting the git2rdata
object with result
in an error.
Because it will look like removing an existing variable and adding a new one.
Overwriting the object with strict = FALSE
potentially changes the order of
the variables, leading to a large diff.
rename_variable(file, change, root = ".", ...) ## S3 method for class 'character' rename_variable(file, change, root = ".", ...) ## Default S3 method: rename_variable(file, change, root, ...) ## S3 method for class 'git_repository' rename_variable(file, change, root, ..., stage = FALSE, force = FALSE)
rename_variable(file, change, root = ".", ...) ## S3 method for class 'character' rename_variable(file, change, root = ".", ...) ## Default S3 method: rename_variable(file, change, root, ...) ## S3 method for class 'git_repository' rename_variable(file, change, root, ..., stage = FALSE, force = FALSE)
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
change |
A named vector with the old names as values and the new names as names. |
root |
The root of a project. Can be a file path or a |
... |
parameters used in some methods |
stage |
Logical value indicating whether to stage the changes after
writing the data. Defaults to |
force |
Add ignored files. Default is FALSE. |
This function solves this by only updating the raw data header and the metadata.
invisible NULL
.
Other storage:
display_metadata()
,
list_data()
,
prune_meta()
,
read_vc()
,
relabel()
,
rm_data()
,
update_metadata()
,
verify_vc()
,
write_vc()
# initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # Create a dataframe and store it as an optimized git2rdata object. # Note that write_vc() uses optimization by default. # Stage and commit the git2rdata object. ds <- data.frame( a = c("a1", "a2"), b = c("b2", "b1"), stringsAsFactors = TRUE ) junk <- write_vc(ds, "rename", repo, sorting = "b", stage = TRUE) cm <- commit(repo, "initial commit") # check that the workspace is clean status(repo) # Define change. change <- c(new_name = "a") rename_variable(file = "rename", change = change, root = repo) # check the changes read_vc("rename", repo) status(repo)
# initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # Create a dataframe and store it as an optimized git2rdata object. # Note that write_vc() uses optimization by default. # Stage and commit the git2rdata object. ds <- data.frame( a = c("a1", "a2"), b = c("b2", "b1"), stringsAsFactors = TRUE ) junk <- write_vc(ds, "rename", repo, sorting = "b", stage = TRUE) cm <- commit(repo, "initial commit") # check that the workspace is clean status(repo) # Define change. change <- c(new_name = "a") rename_variable(file = "rename", change = change, root = repo) # check the changes read_vc("rename", repo) status(repo)
git2r
See repository
in git2r
.
Other version_control:
commit()
,
pull()
,
push()
,
recent_commit()
,
status()
Remove the data (.tsv
) file from all valid git2rdata objects at the path
.
The metadata remains untouched. A warning lists any git2rdata object with
invalid metadata. The function keeps any .tsv
file with
invalid metadata or from non-git2rdata objects.
Use this function with caution since it will remove all valid data files
without asking for confirmation. We strongly recommend to use this
function on files under version control. See
vignette("workflow", package = "git2rdata")
for some examples on how to use
this.
rm_data(root = ".", path = NULL, recursive = TRUE, ...) ## S3 method for class 'git_repository' rm_data( root, path = NULL, recursive = TRUE, ..., stage = FALSE, type = c("unmodified", "modified", "ignored", "all") )
rm_data(root = ".", path = NULL, recursive = TRUE, ...) ## S3 method for class 'git_repository' rm_data( root, path = NULL, recursive = TRUE, ..., stage = FALSE, type = c("unmodified", "modified", "ignored", "all") )
root |
The root of a project. Can be a file path or a |
path |
the directory in which to clean all the data files. The directory
is relative to |
recursive |
remove files in subdirectories too. |
... |
parameters used in some methods |
stage |
stage the changes after removing the files. Defaults to FALSE. |
type |
Defines the classes of files to remove. |
returns invisibly a vector of removed files names. The paths are
relative to root
.
Other storage:
display_metadata()
,
list_data()
,
prune_meta()
,
read_vc()
,
relabel()
,
rename_variable()
,
update_metadata()
,
verify_vc()
,
write_vc()
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # store a dataframe as git2rdata object. Capture the result to minimise # screen output junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # write a standard tab separate file (non git2rdata object) write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t") # write a YAML file yml <- list( authors = list( "Research Institute for Nature and Forest" = list( href = "https://www.inbo.be/en"))) yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml")) # list the git2rdata objects list_data(root) # list the files list.files(root, recursive = TRUE) # remove all .tsv files from valid git2rdata objects rm_data(root, path = ".") # check the removal of the .tsv file list.files(root, recursive = TRUE) list_data(root) # remove dangling git2rdata metadata files prune_meta(root, path = ".") # check the removal of the metadata list.files(root, recursive = TRUE) list_data(root) ## on git repo # initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) # check that the dataframe is stored status(repo) list_data(repo) # commit the current version and check the git repo commit(repo, "add iris data", session = TRUE) status(repo) # remove the data files from the repo rm_data(repo, path = ".") # check the removal list_data(repo) status(repo) # remove dangling metadata prune_meta(repo, path = ".") # check the removal list_data(repo) status(repo)
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # store a dataframe as git2rdata object. Capture the result to minimise # screen output junk <- write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Length") # write a standard tab separate file (non git2rdata object) write.table(iris, file = file.path(root, "standard.tsv"), sep = "\t") # write a YAML file yml <- list( authors = list( "Research Institute for Nature and Forest" = list( href = "https://www.inbo.be/en"))) yaml::write_yaml(yml, file = file.path(root, "_pkgdown.yml")) # list the git2rdata objects list_data(root) # list the files list.files(root, recursive = TRUE) # remove all .tsv files from valid git2rdata objects rm_data(root, path = ".") # check the removal of the .tsv file list.files(root, recursive = TRUE) list_data(root) # remove dangling git2rdata metadata files prune_meta(root, path = ".") # check the removal of the metadata list.files(root, recursive = TRUE) list_data(root) ## on git repo # initialise a git repo using git2r repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe write_vc(iris[1:6, ], "iris", repo, sorting = "Sepal.Length", stage = TRUE) # check that the dataframe is stored status(repo) list_data(repo) # commit the current version and check the git repo commit(repo, "add iris data", session = TRUE) status(repo) # remove the data files from the repo rm_data(repo, path = ".") # check the removal list_data(repo) status(repo) # remove dangling metadata prune_meta(repo, path = ".") # check the removal list_data(repo) status(repo)
git2r
See status
in git2r
.
Other version_control:
commit()
,
pull()
,
push()
,
recent_commit()
,
repository()
git2rdata
objects.Prints the summary of the data and the description of the columns when available.
## S3 method for class 'git2rdata' summary(object, ...)
## S3 method for class 'git2rdata' summary(object, ...)
object |
a |
... |
additional arguments passed to |
Other internal:
is_git2rdata()
,
is_git2rmeta()
,
meta()
,
print.git2rdata()
,
upgrade_data()
git2rdata
objectAllows to update the description of the fields, the table name, the title,
and the description of a git2rdata
object.
All arguments are optional.
Setting an argument to NA
or an empty string will remove the corresponding
field from the metadata.
update_metadata(file, root = ".", field_description, name, title, description)
update_metadata(file, root = ".", field_description, name, title, description)
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
field_description |
a named character vector with the new descriptions for the fields. The names of the vector must match the variable names. |
name |
a character string with the new table name of the object. |
title |
a character string with the new title of the object. |
description |
a character string with the new description of the object. |
Other storage:
display_metadata()
,
list_data()
,
prune_meta()
,
read_vc()
,
relabel()
,
rename_variable()
,
rm_data()
,
verify_vc()
,
write_vc()
Updates the data written by older versions to the current data format
standard. Works both on a single file and (recursively) on a path. The
".yml"
file must contain a "..generic"
element. upgrade_data()
ignores
all other files.
upgrade_data(file, root = ".", verbose, ..., path) ## S3 method for class 'git_repository' upgrade_data( file, root = ".", verbose = TRUE, ..., path, stage = FALSE, force = FALSE )
upgrade_data(file, root = ".", verbose, ..., path) ## S3 method for class 'git_repository' upgrade_data( file, root = ".", verbose = TRUE, ..., path, stage = FALSE, force = FALSE )
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
verbose |
display a message with the update status. Defaults to |
... |
parameters used in some methods |
path |
specify |
stage |
Logical value indicating whether to stage the changes after
writing the data. Defaults to |
force |
Add ignored files. Default is FALSE. |
the git2rdata object names.
Other internal:
is_git2rdata()
,
is_git2rmeta()
,
meta()
,
print.git2rdata()
,
summary.git2rdata()
# create a directory root <- tempfile("git2rdata-") dir.create(root) # write dataframes to the root write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length") write_vc(iris[5:10, ], file = "subdir/iris", root = root, sorting = "Sepal.Length") # upgrade a single git2rdata object upgrade_data(file = "iris", root = root) # use path = "." to upgrade all git2rdata objects under root upgrade_data(path = ".", root = root)
# create a directory root <- tempfile("git2rdata-") dir.create(root) # write dataframes to the root write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length") write_vc(iris[5:10, ], file = "subdir/iris", root = root, sorting = "Sepal.Length") # upgrade a single git2rdata object upgrade_data(file = "iris", root = root) # use path = "." to upgrade all git2rdata objects under root upgrade_data(path = ".", root = root)
Reads the file with read_vc()
.
Then verifies that every variable listed in variables
is present in the
data.frame.
verify_vc(file, root, variables)
verify_vc(file, root, variables)
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
variables |
a character vector with variable names. |
Other storage:
display_metadata()
,
list_data()
,
prune_meta()
,
read_vc()
,
relabel()
,
rename_variable()
,
rm_data()
,
update_metadata()
,
write_vc()
A git2rdata object consists of two files.
The ".tsv"
file contains the raw data as a plain text tab separated file.
The ".yml"
contains the metadata on the columns in plain text YAML format.
See vignette("plain text", package = "git2rdata")
for more details on the
implementation.
write_vc( x, file, root = ".", sorting, strict = TRUE, optimize = TRUE, na = "NA", ..., split_by ) ## S3 method for class 'character' write_vc( x, file, root = ".", sorting, strict = TRUE, optimize = TRUE, na = "NA", ..., split_by = character(0) ) ## S3 method for class 'git_repository' write_vc( x, file, root, sorting, strict = TRUE, optimize = TRUE, na = "NA", ..., stage = FALSE, force = FALSE )
write_vc( x, file, root = ".", sorting, strict = TRUE, optimize = TRUE, na = "NA", ..., split_by ) ## S3 method for class 'character' write_vc( x, file, root = ".", sorting, strict = TRUE, optimize = TRUE, na = "NA", ..., split_by = character(0) ) ## S3 method for class 'git_repository' write_vc( x, file, root, sorting, strict = TRUE, optimize = TRUE, na = "NA", ..., stage = FALSE, force = FALSE )
x |
the |
file |
the name of the git2rdata object. Git2rdata objects cannot
have dots in their name. The name may include a relative path. |
root |
The root of a project. Can be a file path or a |
sorting |
an optional vector of column names defining which columns to
use for sorting |
strict |
What to do when the metadata changes. |
optimize |
If |
na |
the string to use for missing values in the data. |
... |
parameters used in some methods |
split_by |
An optional vector of variables name to split the text files.
This creates a separate file for every combination.
We prepend these variables to the vector of |
stage |
Logical value indicating whether to stage the changes after
writing the data. Defaults to |
force |
Add ignored files. Default is FALSE. |
a named vector with the file paths relative to root
. The names
contain the hashes of the files.
..generic
is a reserved name for the metadata and is a forbidden
column name in a data.frame
.
Other storage:
display_metadata()
,
list_data()
,
prune_meta()
,
read_vc()
,
relabel()
,
rename_variable()
,
rm_data()
,
update_metadata()
,
verify_vc()
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # write a dataframe to the directory write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length") # check that a data file (.tsv) and a metadata file (.yml) exist. list.files(root, recursive = TRUE) # read the git2rdata object from the directory read_vc("iris", root) # store a new version with different observations but the same metadata write_vc(iris[1:5, ], "iris", root) list.files(root, recursive = TRUE) # Removing a column requires version requires new metadata. # Add strict = FALSE to override the existing metadata. write_vc( iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE ) list.files(root, recursive = TRUE) # storing the orignal version again requires another update of the metadata write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE) list.files(root, recursive = TRUE) # optimize = FALSE stores the data more verbose. This requires larger files. write_vc( iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE ) list.files(root, recursive = TRUE) ## on git repo using a git2r::git-repository # initialise a git repo using the git2r package repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe in git repo. write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length") # This git2rdata object is not staged by default. status(repo) # read a dataframe from a git repo read_vc("iris", repo) # store a new version in the git repo and stage it in one go write_vc(iris[1:5, ], "iris", repo, stage = TRUE) status(repo) # store a verbose version in a different gir2data object write_vc( iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE ) status(repo)
## on file system # create a directory root <- tempfile("git2rdata-") dir.create(root) # write a dataframe to the directory write_vc(iris[1:6, ], file = "iris", root = root, sorting = "Sepal.Length") # check that a data file (.tsv) and a metadata file (.yml) exist. list.files(root, recursive = TRUE) # read the git2rdata object from the directory read_vc("iris", root) # store a new version with different observations but the same metadata write_vc(iris[1:5, ], "iris", root) list.files(root, recursive = TRUE) # Removing a column requires version requires new metadata. # Add strict = FALSE to override the existing metadata. write_vc( iris[1:6, -2], "iris", root, sorting = "Sepal.Length", strict = FALSE ) list.files(root, recursive = TRUE) # storing the orignal version again requires another update of the metadata write_vc(iris[1:6, ], "iris", root, sorting = "Sepal.Width", strict = FALSE) list.files(root, recursive = TRUE) # optimize = FALSE stores the data more verbose. This requires larger files. write_vc( iris[1:6, ], "iris2", root, sorting = "Sepal.Width", optimize = FALSE ) list.files(root, recursive = TRUE) ## on git repo using a git2r::git-repository # initialise a git repo using the git2r package repo_path <- tempfile("git2rdata-repo-") dir.create(repo_path) repo <- git2r::init(repo_path) git2r::config(repo, user.name = "Alice", user.email = "[email protected]") # store a dataframe in git repo. write_vc(iris[1:6, ], file = "iris", root = repo, sorting = "Sepal.Length") # This git2rdata object is not staged by default. status(repo) # read a dataframe from a git repo read_vc("iris", repo) # store a new version in the git repo and stage it in one go write_vc(iris[1:5, ], "iris", repo, stage = TRUE) status(repo) # store a verbose version in a different gir2data object write_vc( iris[1:6, ], "iris2", repo, sorting = "Sepal.Width", optimize = FALSE ) status(repo)