bigrf: Big Random Forests: Classification and Regression Forests for
Large Data Sets
This is an implementation of Leo Breiman's and Adele
Cutler's Random Forest algorithms for classification and
regression, with optimizations for performance and for handling
of data sets that are too large to be processed in memory.
Forests can be built in parallel at two levels. First, trees
can be grown in parallel on a single machine using foreach.
Second, multiple forests can be built in parallel on multiple
machines, then merged into one. For large data sets, disk-based
big.matrix's may be used for storing data and intermediate
computations, to prevent excessive virtual memory swapping by
the operating system. Currently, only classification forests
with a subset of the functionality in Breiman and Cutler's
original code are implemented. More functionality and
regression trees will be added in the future. See file
INSTALL-WINDOWS in the source package for Windows installation
instructions.
Downloads: