This package is considered a duplicate. The official version of this package is found at:https://macmillancontentscience.r-universe.dev/wordpiece

Package: wordpiece 2.1.3

Jonathan Bratt

wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arxiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arxiv:1810.04805>) tokenization conventions are used by default.

Authors:Jonathan Bratt [aut, cre], Jon Harmon [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]

wordpiece_2.1.3.tar.gz
wordpiece_2.1.3.zip(r-4.5)wordpiece_2.1.3.zip(r-4.4)wordpiece_2.1.3.zip(r-4.3)
wordpiece_2.1.3.tgz(r-4.4-any)wordpiece_2.1.3.tgz(r-4.3-any)
wordpiece_2.1.3.tar.gz(r-4.5-noble)wordpiece_2.1.3.tar.gz(r-4.4-noble)
wordpiece_2.1.3.tgz(r-4.4-emscripten)wordpiece_2.1.3.tgz(r-4.3-emscripten)
wordpiece.pdf |wordpiece.html✨
wordpiece/json (API)
NEWS

# Install 'wordpiece' in R:

install.packages('wordpiece', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/macmillancontentscience/wordpiece/issues

On CRAN:

4.60 score 8 stars 7 scripts 234 downloads 7 exports 18 dependencies

Last updated 3 years agofrom:3eb92c7595. Checks:3 OK, 4 NOTE. Indexed: no.

Target	Result	Latest binary
Doc / Vignettes	OK	Jan 25 2025
R-4.5-win	NOTE	Jan 25 2025
R-4.5-linux	NOTE	Jan 25 2025
R-4.4-win	NOTE	Jan 25 2025
R-4.4-mac	NOTE	Jan 25 2025
R-4.3-win	OK	Jan 25 2025
R-4.3-mac	OK	Jan 25 2025

Exports:load_or_retrieve_vocab load_vocab prepare_vocab set_wordpiece_cache_dir wordpiece_cache_dir wordpiece_tokenize wordpiece_vocab

Dependencies:cachem cli digest dlr fastmap fastmatch fs glue lifecycle magrittr memoise piecemaker rappdirs rlang stringi stringr vctrs wordpiece.data

Using wordpiece

Rendered frombasic_usage.Rmdusingknitr::rmarkdownon Jan 25 2025.

Last update: 2021-09-27
Started: 2021-01-12

Citation

Development and contributors

Readme and manuals

Help Manual

Help page	Topics
Load a vocabulary file, or retrieve from cache	load_or_retrieve_vocab
Load a vocabulary file	load_vocab
Format a Token List as a Vocabulary	prepare_vocab
Set a Cache Directory for wordpiece	set_wordpiece_cache_dir
Retrieve Directory for wordpiece Cache	wordpiece_cache_dir
Tokenize Sequence with Word Pieces	wordpiece_tokenize