Maintaining Variable Classes
R has different options to store dataframes as plain text files from
R. Base R has write.table()
and its companions like
write.csv()
. Some other options are
data.table::fwrite()
, readr::write_delim()
,
readr::write_csv()
and readr::write_tsv()
.
Each of them writes a dataframe as a plain text file by converting all
variables into characters. After reading the file, they revert this
conversion. The distinction between character
and
factor
gets lost in translation. read.table()
converts by default all strings to factors,
readr::read_csv()
keeps by default all strings as
character. These functions cannot recover the factor levels. These
functions determine factor levels based on the observed levels in the
plain text file. Hence factor levels without observations will
disappear. The order of the factor levels is also determined by the
available levels in the plain text file, which can be different from the
original order.
The write_vc()
and read_vc()
functions from
git2rdata
keep track of the class of each variable and, in
case of a factor, also of the factor levels and their order. Hence this
function pair preserves the information content of the dataframe. The
vc
suffix stands for version
control as these functions
use their full capacity in combination with a version control
system.