mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

Version: 0.0.1
Depends: R (≥ 3.4.0)
Imports: Rcpp (≥ 0.12), Matrix, digest (≥ 0.6.8), sparsepp (≥ 0.2.0)
LinkingTo: Rcpp, digest (≥ 0.6.8), sparsepp (≥ 0.2.0)
Suggests: testthat, knitr
Published: 2018-04-13
Author: Vitalie Spinu [aut, cre]
Maintainer: Vitalie Spinu <spinuvit at gmail.com>
BugReports: https://github.com/vspinu/mlvocab/issues
License: GPL-3
URL: https://github.com/vspinu/mlvocab/
NeedsCompilation: yes
SystemRequirements: C++11
Materials: README
CRAN checks: mlvocab results

Downloads:

Reference manual: mlvocab.pdf
Package source: mlvocab_0.0.1.tar.gz
Windows binaries: r-prerel: mlvocab_0.0.1.zip, r-release: mlvocab_0.0.1.zip, r-oldrel: not available
OS X binaries: r-prerel: mlvocab_0.0.1.tgz, r-release: mlvocab_0.0.1.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=mlvocab to link to this page.