rfishbase

Accessing data from FishBase using R

rfishbase is a package for interfacing with the fishbase.org database. It provides code to download and parse a local copy of fishbase, which can be used for rapid access for a variety of analysis functions.

Installation

rfishbase is available on CRAN and can be installed through the R package manager (see install.packages). The latest (development) version is hosted here on github, and can be installed using the devtools package:

library(devtools)
install_github("rfishbase", "ropensci")

Getting started

rfishbase is developed by Carl Boettiger in collaboration with Duncan Temple Lang and Peter Wainwright, and is part of the rOpenSci project. This software, examples, and documentation are freely provided under the CC0 license.

A preprint of our manuscript introducing the package can be found in the inst/doc directory.

Feature Requests

See something on FishBase that you cannot access using rfishbase? Please request the feature using the Issues Tracker (or email me if you cannot be persuaded to create a Github account). At this time, additional features must be added by HTML Scraping because the FishBase team has not yet made the data available in a more machine-readable format. This has several disadvantages, please see the section below.

HTML Scraping

The rfishbase package originally only parsed the summaryXML forms provided (a detailed description of this approach can be found in my notes), and only these functions are covered in the publication in Journal of Fish Biology. Since then, users frequently request support for additional fields found on the HTML pages of FishBase that are not available in XML pages. To provide this access, I have had to resort to HTML scraping. This has several disadvantages:

Keep these caveats in mind when requesting or using one of the scraping functions.

A Better Solution?

All these potential problems could be avoided, and much more data accessed more easily and reliably, if the FishBase repository provided an Application Programming Interface (API) allowing a machine to query the back-end databases more directly (such as through a RESTful interface), rather than requesting the HTML pages and parsing them. There is interest among the FishBase team for this feature, but not the resources available to do so at this time. For more information, see issue #8.

Contributing

Suggestions, bug reports, forks and pull requests are appreciated. Get in touch.

References