cleanNLP: A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or a Python back ends with 'spaCy' <>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.

Version: 3.1.0
Depends: R (≥ 3.5.0)
Imports: Matrix (≥ 1.2), udpipe, reticulate, stringi, stats, methods
Suggests: knitr (≥ 1.15), rmarkdown (≥ 1.4), testthat (≥ 1.0.1), covr (≥ 2.2.2)
Published: 2024-05-20
DOI: 10.32614/CRAN.package.cleanNLP
Author: Taylor B. Arnold [aut, cre]
Maintainer: Taylor B. Arnold <tarnold2 at>
License: LGPL-2
NeedsCompilation: no
SystemRequirements: Python (>= 3.7.0)
Citation: cleanNLP citation info
Materials: NEWS
CRAN checks: cleanNLP results


Reference manual: cleanNLP.pdf
Vignettes: Exploring the State of the Union Addresses: A Case Study with cleanNLP
Creating Text Visualizations with Wikipedia Data


Package source: cleanNLP_3.1.0.tar.gz
Windows binaries: r-devel:, r-release:, r-oldrel:
macOS binaries: r-release (arm64): cleanNLP_3.1.0.tgz, r-oldrel (arm64): cleanNLP_3.1.0.tgz, r-release (x86_64): cleanNLP_3.1.0.tgz, r-oldrel (x86_64): cleanNLP_3.1.0.tgz
Old sources: cleanNLP archive

Reverse dependencies:

Reverse enhances: NLP


Please use the canonical form to link to this page.