bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

This is an implementation of Leo Breiman's and Adele Cutler's Random Forest algorithms for classification and regression, with optimizations for performance and for handling of data sets that are too large to be processed in memory. Forests can be built in parallel at two levels. First, trees can be grown in parallel on a single machine using foreach. Second, multiple forests can be built in parallel on multiple machines, then merged into one. For large data sets, disk-based big.matrix's may be used for storing data and intermediate computations, to prevent excessive virtual memory swapping by the operating system. Currently, only classification forests with a subset of the functionality in Breiman and Cutler's original code are implemented. More functionality and regression trees may be added in the future.

Version: 0.1-11
Depends: R (≥ 2.14), methods, bigmemory
Imports: foreach
LinkingTo: bigmemory, BH
Suggests: MASS, doParallel
OS_type: unix
Published: 2014-05-16
Author: Aloysius Lim [aut, cre], Leo Breiman [aut], Adele Cutler [aut]
Maintainer: Aloysius Lim <aloysius.lim at gmail.com>
BugReports: https://github.com/aloysius-lim/bigrf/issues
License: GPL-3
Copyright: 2013-2014 Aloysius Lim
URL: https://github.com/aloysius-lim/bigrf
NeedsCompilation: yes
Citation: NA
Materials: NA
In views: HighPerformanceComputing, MachineLearning
CRAN checks: bigrf results

Downloads:

Reference manual: bigrf.pdf
Package source: bigrf_0.1-11.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
OS X Snow Leopard binaries: r-release: bigrf_0.1-11.tgz, r-oldrel: bigrf_0.1-11.tgz
OS X Mavericks binaries: r-release: bigrf_0.1-11.tgz
Old sources: bigrf archive