Everybody develops its own coding habits and style. Some people take a lot of effort in making their source code readable, while others don’t bother at all. Working together with other people is easier when everyone uses the same standard.
The checklist package defines a set of standards and
provides tools to validate whether your project or R package adheres to
these standards. It integrates several existing tools like rcmdcheck, lintr, devtools, desc, hunspell, pkgdown.
checklist always uses these tools with hard-coded settings,
ensuring that everyone uses the same settings. Since version 0.5.0,
every organisation can define its own settings in a central repository.
More information on that in vignette("organisation").
checklist tries to make on-boarding as easy as possible.
We do this by providing the interactive functions
create_package() and create_project(). These
function don’t just provide a template which the user must fill in.
Instead they guide the user through a series of questions to set up the
project or package according to the best practices. Hence the user can
start from a solid basis rather than having to figure out all the best
practices by him/herself. We deliberately choose the highest quality
level for R packages by enforcing all relevant checks. In case of
project we allow the user to choose which checks to apply. More
information on that in vignette("getting_started") or
vignette("getting_started_project"). You can apply (or
update) the checklist setting to an existing project or package using
setup_package() or setup_project(). However,
we recommend that you first get used to checklist by
creating a new project or package from scratch. Since
checklist enforces several best practices, it is easier to
learn them from the start rather than trying to adapt an existing
project. You might need to refactor some parts of your existing code to
meet the quality standards.
Currently, checklist handles two different types of
projects: R packages and non-package R projects. For both types of
projects, checklist provides a set of checks to validate
the quality of your code. You can either choose to run the entire set of
checks at once, or run individual checks.
We have a set of rules that are relevant for both packages and non-package projects.
check_spelling(): checks the spelling in R code and
markdown files using hunspell. More information in
vignette("spelling").check_lintr(): runs lintr::lint_dir() to
check the code style. More information in
vignette("coding_style").check_filename(): checks whether all file names meet
the naming conventions. More information in
vignette("file_name").check_folder(): checks whether the folder structure
meets the conventions. More information in
vignette("file_name").check_license(): checks whether a valid license is
present.update_citation(): checks the citation metadata and
update the citation filesOn a package, you can run check_package() to run these
additional checks.
check_cran(): runs rcmdcheck::rcmdcheck()
to check whether the package meets CRAN standards.check_description(): checks the
DESCRIPTION file for common problems.check_documentation(): checks whether the documentation
is up-to-date.check_codemeta(): checks whether the
codemeta.json file is present and valid.pkgdowncovr::package_coverage(). Adding more code should not
(significantly) decrease the code coverage.You can run these functions interactively on your machine. We
recommend to run the individual checks while developing your code to fix
problems as soon as they arise. Before pushing your code to GitHub,
always run check_package() or check_project().
You can also add these checks as GitHub actions, which
runs them automatically after every push to the repository
on GitHub. We recommend to set up these
checks as required checks on GitHub. This prevents merging code to the
main branch that does not meet the quality standards. In case of an R
package, merging a pull request on GitHub will update the pkgdown
website automatically too and creates a release of the new version.
Therefore you must increase the version of the package in the
DESCRIPTION file when starting a new branch.
Both check_package() and check_project()
generate an extensive report on the checks that were performed. The
report contains the following sections:
The report lists all errors, warnings and notes that were found
during the checks. You can use this report to fix the problems in your
code. After fixing the problems, you can run the checks again to see
whether all problems are solved. You must fix all errors in order to
pass check_package() or check_project(). You
should try to fix as many warnings and notes as possible too. Some
warnings and notes might be acceptable in certain situations. You can
document these exceptions via write_checklist() in the
checklist.yml file. The report can have three categories of
warnings and notes: “new”, “allowed” and “missing”.
check_package() or check_project(). Allow
them, when fixing is not possible at the moment.checklist.yml file.
They don’t result in failing check_package() or
check_project().check_package() or
check_project(). You should remove these exceptions from
the documentation via write_checklist().When you use version control like git,
checklist can detect which files were changed since the
last commit. This is useful because some checks rewrite an improved
version of a file. E.g. check_documentation() regenerates
the documentation files from the roxygen2 tags in the R
code. If you forgot to regenerate the documentation after changing the
code, check_documentation() will detect that the
documentation files are out-of-date. The checklist will
report this and show which changes took place.
Most problems are related to the R version or the versions of the packages that are used. To help you debug problems, the report contains the session info. This is especially important when checks pass locally but fail on GitHub Actions. When this occurs, you can compare the session info of your local machine with the session info on GitHub Actions to find out which package versions differ. Then use the same package versions locally to reproduce and fix the problem.
The GitHub Actions run on a Docker image based on rocker/verse:latest.
It should contain the latest R version and the latest versions of the
most common R packages. Any other packages that your code need are
installed on the fly.
When you share your code with other people, you want them to be able
to find it easily. You also want them to be able to cite a specific
version of your code in a report or paper. checklist helps
you to achieve this by enforcing a strict usage of metadata either in
the DESCRIPTION (package) or README.md
(non-package). Because of the strict format of the person information,
we recommend to add most people when creating the project or package via
the interactive functions create_project() or
create_package(). The functions not only add the people to
the metadata, but also store their information for later reuse.
You can define a list of default organisations in your organisations’
checklist repository. This ensures that everyone in the
organisation uses the standard names of these organisations. Based on
matching e-mail domains, checklist enforces that persons
use their official organisation name as affiliation. More information in
vignette("organisation").
When you push your code to GitHub, you can use the GitHub-Zenodo
integration to create a DOI for every release of your code.
checklist automatically adds the required metadata to a
.zenodo.json in your repository so that Zenodo can create a
proper citation for your code. When authors add their ORCID to their
person information, this ORCID is included in the Zenodo metadata too.
Then every release of your code can automatically flow into you ORCID
profile. Which makes it easier to maintain an up-to-date list of your
publications. And your organisation can import the publications of all
its researchers based on their ORCID. More information in
vignette("zenodo").
Most users think of an R package as a collection of generic functions that they can use to run their analysis. However, an R package is a useful way to bundle and document a stand-alone analysis too! Suppose you want to pass your code to a collaborator or your future self who is working on a different computer. If you have a project folder with a bunch of files, people will need to get to know your project structure, find out what scripts to run and which dependencies they need. Unless you documented everything well they (including your future self!) will have a hard time figuring out how things work.
Having the analysis as a package and running
check_package() to ensure a minimal quality standard, makes
things a lot easier for the user. Agreed, it will take a bit more time
to create the analysis, especially with the first few projects. In the
long run you save time due to a better quality of your code. Try to
start by packaging a recurrent analysis or standardised report when you
want to learn writing a package. Once you have some experience, it is
little overhead to do it for smaller analysis. Keep in mind that you
seldom run an analysis exactly once.
remotes::install_github("inbo/packagename").inst folder is an ideal place to bundle such scripts within
the package. You can also use it to store small (!) datasets or
rmarkdown reports.