cleanNLP: A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford's CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.

Version: 0.24
Depends: dplyr, magrittr, R (≥ 2.10)
Imports: readr, rJava, RCurl
Published: 2016-11-11
Author: Taylor B. Arnold
Maintainer: Taylor B. Arnold <taylor.arnold at acm.org>
License: GPL-3
NeedsCompilation: no
SystemRequirements: Java (>= 7.0); Stanford CoreNLP <http://nlp.stanford.edu/ software/corenlp.shtml> (>= 3.5.2)
Materials: README
CRAN checks: cleanNLP results

Downloads:

Reference manual: cleanNLP.pdf
Package source: cleanNLP_0.24.tar.gz
Windows binaries: r-devel: cleanNLP_0.24.zip, r-release: cleanNLP_0.24.zip, r-oldrel: cleanNLP_0.24.zip
OS X Mavericks binaries: r-release: cleanNLP_0.24.tgz, r-oldrel: cleanNLP_0.24.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=cleanNLP to link to this page.