rsparse is an R package for statistical learning primarily on sparse matrices - matrix factorizations, factorization machines, out-of-core regression. Many of the implemented algorithms are particularly useful for recommender systems and NLP.
On top of that we provide some optimized routines to work on sparse matrices - multithreaded <dense, sparse> matrix multiplications and improved support for sparse matrices in CSR format (Matrix::RsparseMatrix).
We’ve paid some attention to the implementation details - we try to avoid data copies, utilize multiple threads via OpenMP and use SIMD where appropriate. Package allows to work on datasets with millions of rows and millions of columns.
Support
Please reach us if you need commercial support - hello@rexy.ai.
Features
Classification/Regression
Follow the proximally-regularized leader which llows to solve very large linear/logistic regression problems with elastic-net penalty. Solver use with stochastic gradient descend with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See Ad Click Prediction: a View from the Trenches for more examples.
Only logistic regerssion implemented at the moment
Native format for matrices is CSR - Matrix::RsparseMatrix. However common R Matrix::CpasrseMatrix (dgCMatrix) will be converted automatically.
Factorization Machines supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation.
Matrix Factorizations
Vanilla Maximum Margin Matrix Factorization - classic approch for “rating” prediction. See WRMF class and constructor option feedback = "explicit". Original paper which indroduced MMMF could be found here.
multithreaded %*% and tcrossprod() for <dgRMatrix, matrix>
multithreaded %*% and crossprod() for <matrix, dgCMatrix>
natively slice CSR matrices (Matrix::RsparseMatrix) without converting them to triplet / CSC
Installation
Most of the algorithms benefit from OpenMP and many of them could utilize high-performance implementation of BLAS. If you want make maximum out of the package please read the section below carefuly.
It is recommended to:
Use high-performance BLAS (such as OpenBLAS, MKL, Apple Accelerate).
Add proper compiler optimizations in your ~/.R/Makevars. For example on recent processors (with AVX support) and complier with OpenMP support following lines could be a good option: txt CXX11FLAGS += -O3 -march=native -mavx -fopenmp -ffast-math CXXFLAGS += -O3 -march=native -mavx -fopenmp -ffast-math
If you are on Mac follow instructions here. After installation of clang4 additionally put PKG_CXXFLAGS += -DARMA_USE_OPENMP line to your ~/.R/Makevars. After that install rsparse in a usual way.
Materials
Note that syntax is these posts/slides is not up to date since package was under active development