Package: wordpiece.data 2.0.0
wordpiece.data: Data for Wordpiece-Style Tokenization
Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.
Authors:
wordpiece.data_2.0.0.tar.gz
wordpiece.data_2.0.0.zip(r-4.5)wordpiece.data_2.0.0.zip(r-4.4)wordpiece.data_2.0.0.zip(r-4.3)
wordpiece.data_2.0.0.tgz(r-4.4-any)wordpiece.data_2.0.0.tgz(r-4.3-any)
wordpiece.data_2.0.0.tar.gz(r-4.5-noble)wordpiece.data_2.0.0.tar.gz(r-4.4-noble)
wordpiece.data_2.0.0.tgz(r-4.4-emscripten)wordpiece.data_2.0.0.tgz(r-4.3-emscripten)
wordpiece.data.pdf |wordpiece.data.html✨
wordpiece.data/json (API)
NEWS
# Install 'wordpiece.data' in R: |
install.packages('wordpiece.data', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/macmillancontentscience/wordpiece.data/issues
Last updated 3 years agofrom:f893df5061. Checks:OK: 7. Indexed: no.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Oct 29 2024 |
R-4.5-win | OK | Oct 29 2024 |
R-4.5-linux | OK | Oct 29 2024 |
R-4.4-win | OK | Oct 29 2024 |
R-4.4-mac | OK | Oct 29 2024 |
R-4.3-win | OK | Oct 29 2024 |
R-4.3-mac | OK | Oct 29 2024 |
Exports:wordpiece_vocab
Dependencies:
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Load a wordpiece Vocabulary | wordpiece_vocab |