Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization conventions are used by default.
Version: | 1.0.2 |
Depends: | R (≥ 3.3.0) |
Imports: | digest (≥ 0.6.5), purrr (≥ 0.2.3), rappdirs (≥ 0.3), stringi (≥ 1.0) |
Suggests: | testthat (≥ 2.1.0), knitr, rmarkdown, covr |
Published: | 2021-02-11 |
Author: | Jonathan Bratt |
Maintainer: | Jonathan Bratt <jonathan.bratt at macmillan.com> |
BugReports: | https://github.com/jonathanbratt/wordpiece/issues |
License: | Apache License (≥ 2) |
URL: | https://github.com/jonathanbratt/wordpiece |
NeedsCompilation: | no |
Materials: | README NEWS |
CRAN checks: | wordpiece results |
Reference manual: | wordpiece.pdf |
Vignettes: |
Using wordpiece |
Package source: | wordpiece_1.0.2.tar.gz |
Windows binaries: | r-devel: wordpiece_1.0.2.zip, r-release: wordpiece_1.0.2.zip, r-oldrel: wordpiece_1.0.2.zip |
macOS binaries: | r-release: wordpiece_1.0.2.tgz, r-oldrel: wordpiece_1.0.2.tgz |
Please use the canonical form https://CRAN.R-project.org/package=wordpiece to link to this page.