This R package supports the use of standardized folder names in R
projects. The idea is to provide some functions to allow you to avoid
using hardcoded paths and setwd()
in your R scripts.
Instead, you can use variables like folders$data
to
refer to folder paths. These paths can be standardized between projects.
The folders can be created for you under the parent folder of your R
project.
Using the defaults, or some other standardized list of folder names, all of your projects can have the same general folder structure. This can help you write cleaner, more portable, and more reproducible code.
The package defaults provide “code”, “conf”, “data”, “doc”, “figures” and “results” folders. You can specify alternatives in a YAML configuration file, which this package will read and use instead. See “Configuration file” below for more details.
You will note there is a “code” folder. If your scripts are in the “code” folder, your code will still be able to find the other folders, thanks to the here package.
This package is intended to be used with RStudio Projects.
A benefit of using RStudio Projects is, once you open the project in RStudio, you will be placed in the parent folder of your project (aka. “project root”). All of your work in the project will be relative to that location, especially if your project only uses files and subfolders within that parent folder. This the most portable way to work. Further, if you are working with a git repository, you will most likely want to clone this repository into an RStudio Project.
This package will also work outside of RStudio Projects. For example, if you are working in a folder tracked by git, then the top level of the git repository will be identified as the “project root” folder. This behavior is determined by the here package.
If you are neither working in an RStudio project, nor in a folder tracked by a version control system (git or Subversion), nor an R package development folder, then the current working directory at the time the here package was loaded will be treated as the “project root” folder.
Or you can force a folder to be the “project root” with a
.here
file. You can create one with the
here::set_here()
function. See the here package
documentation for more information. However, if your goal is to write
more reproducible code and follow best practices, you should really ask
yourself why you are not using RStudio Projects or version control.
You can install the stable version from CRAN with:
install.packages("folders")
You can install the development version from GitHub with:
# install.packages("devtools")
::install_github("deohs/folders") devtools
Or, if you prefer using pacman:
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
::p_load_gh("deohs/folders") pacman
The following code chunk can be used at the beginning of your scripts to make use of standardized folders in your projects.
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
::p_load(here, folders)
pacman
# Get the list of standard folders and create any folders which are missing
<- here('conf', 'folders.yml')
conf_file <- get_folders(conf_file)
folders <- create_folders(folders) result
Then, later in your scripts, you can refer to folders like this:
dir.exists(here(folders$data))
## [1] TRUE
Or you can add to the standard folder paths like this:
<- here(folders$data, "data.csv") file_path
Here is an example of a script which will initialize the folders and
then write a data file to the folders$data
folder. Aside
from setting the path to the configuration file, there are no hardcoded
paths and there is no setwd()
.
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
::p_load(here, folders)
pacman
# Get the list of standard folders and create any folders which are missing
<- here('conf', 'folders.yml')
conf_file <- get_folders(conf_file)
folders <- create_folders(folders)
result
# Check to see that the data folder has been created
dir.exists(here(folders$data))
## [1] TRUE
# Create a dataset to use for writing a CSV file to the data folder
<- data.frame(x = letters[1:3], y = 1:3)
df
# Confirm that the CSV file does not yet exist
<- here(folders$data, "data.csv")
file_path file.exists(file_path)
## [1] FALSE
# Write the CSV file
write.csv(df, file_path, row.names = FALSE)
# Verify that the file was written
file.exists(file_path)
## [1] TRUE
# Cleanup unused (empty) folders (Optional, as you may prefer to keep them)
<- cleanup_folders(folders, conf_file)
result
# Verify that the data folder and CSV file still exist after cleanup
file.exists(file_path)
## [1] TRUE
# Verify that the configuration file still exists after cleanup
file.exists(conf_file)
## [1] TRUE
You can refer to subfolders relative to the paths in your
folders
list using here()
. For example, if you
had a folder called “raw” under your data folder, just refer to that
folder with here(folders$data, "raw")
:
<- here('conf', 'folders.yml')
conf_file <- get_folders(conf_file)
folders <- here(folders$data, "raw", "file.csv") raw_df
If you want to create a subfolder hierarchy under all of your main
folders, you can use lapply()
or purrr::map()
to create that hierarchy. For example, we can create a “phase” folder
under each folder in folders
and then a “01” folder under
each “phase” folder:
<- here('conf', 'folders.yml')
conf_file <- lapply(get_folders(conf_file), here, "phase", "01")
folders <- create_folders(folders) res
You can place that near the top of each of your scripts, adjusting
for the project phase the script is used for, then you can then use
folders$data
to refer to a path like
data/phase/01
within the parent folder. This way, your
scripts can always refer to the appropriate data, results, etc., folder
for that project phase using the same variables, e.g.,
folders$data
, folders$results
, etc.
<- read.csv(here(folders$data, "data.csv")) df
The configuration file, if not already present, will be written by
get_folders()
to a YAML file with a path and filename that
you provide. Usually this would be named something like
folders.yml
, as in the examples above, and usually you will
want this file stored in either the parent folder or the “conf”
subfolder of your R project. This file will be read by
config::get()
on subsequent executions of
get_folders()
. This behavior can be modified by function
parameters.
The default configuration file looks like:
default:
code: code
conf: conf
data: data
doc: doc
figures: figures
results: results
Once this file has been created, you can edit it to modify the default folder paths. However, we advise you to stick to the defaults to maintain maximum consistency between your projects. If you do wish to edit it, you can do so with any text editor or with R as shown below.
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
::p_load(here, yaml, folders)
pacman
# Get the list of standard folders, creating the configuration file if missing
<- here('conf', 'folders.yml')
conf_file <- get_folders(conf_file)
folders
# Replace a default with a custom folder path
$data <- "data_folder"
folders
# Edit the default configuration file to save the modification
write_yaml(list(default = folders), file = conf_file)
You can add other configuration names besides “default”. For example, if you had a path that would be operating system dependent, you could edit your configuration file like this (abbreviated here to show only the data path):
default:
data: data
Windows:
data: //server/path/to/data
Linux:
data: /path/to/data
Darwin:
data: /Volumes/path/to/data
And then you can read in the appropriate paths for the system you are using:
<- here('conf', 'folders.yml')
conf_file <- get_folders(conf_file, conf_name = Sys.info()[['sysname']])
folders <- folders$data data_folder
… so that your script can still be platform independent even though
the data path is not. If your specified conf_name
does not
exist in the configuration file, then the defaults are used instead.
When you install this package, the following dependencies should be installed for you: config, here, yaml.
You will need to load the here package with your scripts to make the most use of the folders package, as seen in the Basic Usage examples above.