cleanNLP: A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford's CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.

Version: 0.24
Depends: dplyr, magrittr, R (≥ 2.10)
Imports: readr, rJava, RCurl
Published: 2016-11-11
Author: Taylor B. Arnold
Maintainer: Taylor B. Arnold <taylor.arnold at>
License: GPL-3
NeedsCompilation: no
SystemRequirements: Java (>= 7.0); Stanford CoreNLP < software/corenlp.shtml> (>= 3.5.2)
Materials: README
CRAN checks: cleanNLP results


Reference manual: cleanNLP.pdf
Package source: cleanNLP_0.24.tar.gz
Windows binaries: r-devel:, r-release:, r-oldrel:
OS X Mavericks binaries: r-release: cleanNLP_0.24.tgz, r-oldrel: cleanNLP_0.24.tgz


Please use the canonical form to link to this page.