Package: wordpiece 2.1.3

Jonathan Bratt

wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arxiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arxiv:1810.04805>) tokenization conventions are used by default.

Authors:Jonathan Bratt [aut, cre], Jon Harmon [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]

wordpiece_2.1.3.tar.gz
wordpiece_2.1.3.zip(r-4.7)wordpiece_2.1.3.zip(r-4.6)wordpiece_2.1.3.zip(r-4.5)
wordpiece_2.1.3.tgz(r-4.6-any)wordpiece_2.1.3.tgz(r-4.5-any)
wordpiece_2.1.3.tar.gz(r-4.7-any)wordpiece_2.1.3.tar.gz(r-4.6-any)
wordpiece_2.1.3.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
wordpiece/json (API)
NEWS

# Install 'wordpiece' in R:
install.packages('wordpiece', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/macmillancontentscience/wordpiece/issues

On CRAN:

Conda:

4.60 score 8 stars 7 scripts 213 downloads 7 exports 18 dependencies

Last updated from:3eb92c7595. Checks:7 NOTE, 2 OK. Indexed: no.

TargetResultTimeFilesSyslog
linux-devel-x86_64NOTE122
source / vignettesOK181
linux-release-x86_64NOTE131
macos-release-arm64NOTE140
macos-oldrel-arm64NOTE81
windows-develNOTE71
windows-releaseNOTE75
windows-oldrelNOTE88
wasm-releaseOK117

Exports:load_or_retrieve_vocabload_vocabprepare_vocabset_wordpiece_cache_dirwordpiece_cache_dirwordpiece_tokenizewordpiece_vocab

Dependencies:cachemclidigestdlrfastmapfastmatchfsgluelifecyclemagrittrmemoisepiecemakerrappdirsrlangstringistringrvctrswordpiece.data

Using wordpiece

Rendered frombasic_usage.Rmdusingknitr::rmarkdownon Jun 01 2026.

Last update: 2021-09-27
Started: 2021-01-12