cleanNLP: A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of a Python back end with 'spaCy' <https://spacy.io> or the Java back end 'CoreNLP' <http://stanfordnlp.github.io/CoreNLP/>. A minimal back end with no external dependencies is also provided. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.

Version: 1.9.0
Depends: R (≥ 2.10)
Imports: dplyr (≥ 0.5.0), readr (≥ 1.1.0), Matrix (≥ 1.2), stats, methods, utils
Suggests: reticulate (≥ 0.7), rJava (≥ 0.9-8), tokenizers (≥ 0.1.4), RCurl (≥ 1.95), knitr (≥ 1.15), rmarkdown (≥ 1.4), testthat (≥ 1.0.1), covr (≥ 2.2.2)
Published: 2017-05-27
Author: Taylor B. Arnold [aut, cre]
Maintainer: Taylor B. Arnold <taylor.arnold at acm.org>
BugReports: http://github.com/statsmaths/cleanNLP/issues
License: GPL-3
URL: https://statsmaths.github.io/cleanNLP/
NeedsCompilation: no
SystemRequirements: Python (>= 2.7.0); spaCy <https://spacy.io/> (>= 1.8); Java (>= 7.0); Stanford CoreNLP <http://nlp.stanford.edu/software/corenlp.shtml> (>= 3.7.0)
Materials: NEWS
CRAN checks: cleanNLP results

Downloads:

Reference manual: cleanNLP.pdf
Vignettes: Exploring the State of the Union Addresses: A Case Study with cleanNLP
A Data Model for the NLP Pipeline
Package source: cleanNLP_1.9.0.tar.gz
Windows binaries: r-devel: cleanNLP_1.9.0.zip, r-release: cleanNLP_1.9.0.zip, r-oldrel: cleanNLP_1.9.0.zip
OS X El Capitan binaries: r-release: cleanNLP_1.9.0.tgz
OS X Mavericks binaries: r-oldrel: cleanNLP_1.9.0.tgz
Old sources: cleanNLP archive

Linking:

Please use the canonical form https://CRAN.R-project.org/package=cleanNLP to link to this page.