Package: wordpiece.data 2.0.0

wordpiece.data: Data for Wordpiece-Style Tokenization

Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.

Authors:Jonathan Bratt [aut], Jon Harmon [aut, cre], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph], Google, Inc [cph]

wordpiece.data_2.0.0.tar.gz
wordpiece.data_2.0.0.zip(r-4.5)wordpiece.data_2.0.0.zip(r-4.4)wordpiece.data_2.0.0.zip(r-4.3)
wordpiece.data_2.0.0.tgz(r-4.4-any)wordpiece.data_2.0.0.tgz(r-4.3-any)
wordpiece.data_2.0.0.tar.gz(r-4.5-noble)wordpiece.data_2.0.0.tar.gz(r-4.4-noble)
wordpiece.data_2.0.0.tgz(r-4.4-emscripten)wordpiece.data_2.0.0.tgz(r-4.3-emscripten)
wordpiece.data.pdf |wordpiece.data.html
wordpiece.data/json (API)
NEWS

# Install 'wordpiece.data' in R:
install.packages('wordpiece.data', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/macmillancontentscience/wordpiece.data/issues

On CRAN:

3.18 score 1 packages 5 scripts 191 downloads 1 exports 0 dependencies

Last updated 3 years agofrom:f893df5061. Checks:OK: 7. Indexed: no.

TargetResultDate
Doc / VignettesOKOct 29 2024
R-4.5-winOKOct 29 2024
R-4.5-linuxOKOct 29 2024
R-4.4-winOKOct 29 2024
R-4.4-macOKOct 29 2024
R-4.3-winOKOct 29 2024
R-4.3-macOKOct 29 2024

Exports:wordpiece_vocab

Dependencies: