The idea behind rio is to simplify the process of importing data into R and exporting data from R. This process is, probably unnecessarily, extremely complex for beginning R users. Indeed, R supplies an entire manual describing the process of data import/export. And, despite all of that text, most of the packages described are (to varying degrees) out-of-date. Faster, simpler, packages with fewer dependencies have been created for many of the file types described in that document. rio aims to unify data I/O (importing and exporting) into two simple functions:
export so that beginners (and experienced R users) never have to think twice (or even once) about the best way to read and write R data.
The core advantage of rio is that it makes assumptions that the user is probably willing to make. Specifically, rio uses the file extension of a file name to determine what kind of file it is. This is the same logic used by Windows OS, for example, in determining what application is associated with a given file type. By taking away the need to manually match a file type (which a beginner may not recognize) to a particular import or export function, rio allows almost all common data formats to be read with the same function.
By making import and export easy, it's an obvious next step to also use R as a simple data conversion utility. Transferring data files between various proprietary formats is always a pain and often expensive. The
convert function therefore combines
export to easily convert between file formats (thus providing a FOSS replacement for programs like Stat/Transfer or Sledgehammer).
rio supports a variety of different file formats for import and export.
|Tab-separated data (.tsv)||Yes||Yes|
|Comma-separated data (.csv)||Yes||Yes|
|CSVY (CSV + YAML metadata header) (.csvy)||Yes||Yes|
|Feather R/Python interchange format (.feather)||Yes||Yes|
|Pipe-separated data (.psv)||Yes||Yes|
|Fixed-width format data (.fwf)||Yes||Yes|
|Serialized R objects (.rds)||Yes||Yes|
|Saved R objects (.RData)||Yes||Yes|
|SPSS and SPSS portable||Yes (.sav and .por)||Yes (.sav)|
|“XBASE” database files (.dbf)||Yes||Yes|
|Weka Attribute-Relation File Format (.arff)||Yes||Yes|
|R syntax (.R)||Yes||Yes|
|Shallow XML documents (.xml)||Yes||Yes|
|HTML Tables (.html)||Yes||Yes|
|SAS XPORT (.xpt)||Yes|
|Data Interchange Format (.dif)||Yes|
|OpenDocument Spreadsheet (.ods)||Yes|
|Fortran data (no recognized extension)||Yes|
|Clipboard (default is tsv)||Yes (Mac and Windows)||Yes (Mac and Windows)|
Additionally, any format that is not supported by rio but that has a known R implementation will produce an informative error message pointing to a package and import or export function. Unrecognized formats will yield a simple “Unrecognized file format” error.
rio allows you to import files in almost any format using one, typically single-argument, function.
import infers the file format from the file's extension and calls the appropriate data import function for you, returning a simple data.frame. This works for any for the formats listed above.
library("rio") x <- import("iris.csv") y <- import("iris.rds") z <- import("iris.dta") # confirm identical all.equal(x, y, check.attributes = FALSE)
##  "Component \"Species\": target is character, current is factor"
all.equal(x, z, check.attributes = FALSE)
##  "Component \"Species\": target is character, current is labelled"
If for some reason a file does not have an extension, or has a file extension that does not match its actual type, you can manually specify a file format to override the format inference step. For example, we can read in a CSV file that does not have a file extension by specifying
head(import("iris_noext", format = "csv"))
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa
The export capabilities of rio are somewhat more limited than the import capabilities, given the availability of different functions in various R packages and because import functions are often written to make use of data from other applications and it never seems to be a development priority to have functions to export to the formats used by other applications. That said, rio currently supports the following formats:
library("rio") export(iris, "iris.csv") export(iris, "iris.rds") export(iris, "iris.dta")
It is also easy to use
export as part of an R pipeline (from magrittr or dplyr). For example, the following code uses
export to save the results of a simple data transformation:
library("magrittr") mtcars %>% subset(hp > 100) %>% aggregate(. ~ cyl + am, data = ., FUN = mean) %>% export(file = "mtcars2.dta")
convert function links
export by constructing a dataframe from the imported file and immediately writing it back to disk.
convert invisibly returns the file name of the exported file, so that it can be used to programmatically access the new file.
convert is just a thin wrapper for
export, it is very easy to use. For example, we can convert
# create file to convert export(iris, "iris.dta") # convert Stata to SPSS convert("iris.dta", "iris.sav")
convert also accepts lists of arguments for controlling import (
in_opts) and export (
out_opts). This can be useful for passing additional arguments to import or export methods. This could be useful, for example, for reading in a fixed-width format file and converting it to a comma-separated values file:
# create an ambiguous file fwf <- tempfile(fileext = ".fwf") cat(file = fwf, "123456", "987654", sep = "\n") # see two ways to read in the file identical(import(fwf, widths = c(1,2,3)), import(fwf, widths = c(1,-2,3)))
##  FALSE
# convert to CSV convert(fwf, "fwf.csv", in_opts = list(widths = c(1,2,3))) import("fwf.csv") # check conversion
## V1 V2 V3 ## 1 1 23 456 ## 2 9 87 654
It is also possible to use rio on the command-line by calling
Rscript with the
-e (expression) argument. For example, to convert a file from Stata (.dta) to comma-separated values (.csv), simply do the following:
Rscript -e "rio::convert('iris.dta', 'iris.csv')"