Package: wordpiece 2.1.3
wordpiece: R Implementation of Wordpiece Tokenization
Apply 'Wordpiece' (<arxiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arxiv:1810.04805>) tokenization conventions are used by default.
Authors:
wordpiece_2.1.3.tar.gz
wordpiece_2.1.3.zip(r-4.5)wordpiece_2.1.3.zip(r-4.4)wordpiece_2.1.3.zip(r-4.3)
wordpiece_2.1.3.tgz(r-4.4-any)wordpiece_2.1.3.tgz(r-4.3-any)
wordpiece_2.1.3.tar.gz(r-4.5-noble)wordpiece_2.1.3.tar.gz(r-4.4-noble)
wordpiece_2.1.3.tgz(r-4.4-emscripten)wordpiece_2.1.3.tgz(r-4.3-emscripten)
wordpiece.pdf |wordpiece.html✨
wordpiece/json (API)
NEWS
# Install 'wordpiece' in R: |
install.packages('wordpiece', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/macmillancontentscience/wordpiece/issues
Last updated 3 years agofrom:3eb92c7595. Checks:OK: 3 NOTE: 4. Indexed: no.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Oct 27 2024 |
R-4.5-win | NOTE | Oct 27 2024 |
R-4.5-linux | NOTE | Oct 27 2024 |
R-4.4-win | NOTE | Oct 27 2024 |
R-4.4-mac | NOTE | Oct 27 2024 |
R-4.3-win | OK | Oct 27 2024 |
R-4.3-mac | OK | Oct 27 2024 |
Exports:load_or_retrieve_vocabload_vocabprepare_vocabset_wordpiece_cache_dirwordpiece_cache_dirwordpiece_tokenizewordpiece_vocab
Dependencies:cachemclidigestdlrfastmapfastmatchfsgluelifecyclemagrittrmemoisepiecemakerrappdirsrlangstringistringrvctrswordpiece.data
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Load a vocabulary file, or retrieve from cache | load_or_retrieve_vocab |
Load a vocabulary file | load_vocab |
Format a Token List as a Vocabulary | prepare_vocab |
Set a Cache Directory for wordpiece | set_wordpiece_cache_dir |
Retrieve Directory for wordpiece Cache | wordpiece_cache_dir |
Tokenize Sequence with Word Pieces | wordpiece_tokenize |