Package: wordpiece 2.1.3

Jonathan Bratt

wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arxiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arxiv:1810.04805>) tokenization conventions are used by default.

Authors:Jonathan Bratt [aut, cre], Jon Harmon [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]

wordpiece_2.1.3.tar.gz
wordpiece_2.1.3.zip(r-4.5)wordpiece_2.1.3.zip(r-4.4)wordpiece_2.1.3.zip(r-4.3)
wordpiece_2.1.3.tgz(r-4.4-any)wordpiece_2.1.3.tgz(r-4.3-any)
wordpiece_2.1.3.tar.gz(r-4.5-noble)wordpiece_2.1.3.tar.gz(r-4.4-noble)
wordpiece_2.1.3.tgz(r-4.4-emscripten)wordpiece_2.1.3.tgz(r-4.3-emscripten)
wordpiece.pdf |wordpiece.html
wordpiece/json (API)
NEWS

# Install 'wordpiece' in R:
install.packages('wordpiece', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/macmillancontentscience/wordpiece/issues

On CRAN:

4.60 score 8 stars 7 scripts 203 downloads 7 exports 18 dependencies

Last updated 3 years agofrom:3eb92c7595. Checks:OK: 3 NOTE: 4. Indexed: no.

TargetResultDate
Doc / VignettesOKOct 27 2024
R-4.5-winNOTEOct 27 2024
R-4.5-linuxNOTEOct 27 2024
R-4.4-winNOTEOct 27 2024
R-4.4-macNOTEOct 27 2024
R-4.3-winOKOct 27 2024
R-4.3-macOKOct 27 2024

Exports:load_or_retrieve_vocabload_vocabprepare_vocabset_wordpiece_cache_dirwordpiece_cache_dirwordpiece_tokenizewordpiece_vocab

Dependencies:cachemclidigestdlrfastmapfastmatchfsgluelifecyclemagrittrmemoisepiecemakerrappdirsrlangstringistringrvctrswordpiece.data

Using wordpiece

Rendered frombasic_usage.Rmdusingknitr::rmarkdownon Oct 27 2024.

Last update: 2021-09-27
Started: 2021-01-12