Package: wordpiece 2.1.3
wordpiece: R Implementation of Wordpiece Tokenization
Apply 'Wordpiece' (<arxiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arxiv:1810.04805>) tokenization conventions are used by default.
Authors:
wordpiece_2.1.3.tar.gz
wordpiece_2.1.3.zip(r-4.7)wordpiece_2.1.3.zip(r-4.6)wordpiece_2.1.3.zip(r-4.5)
wordpiece_2.1.3.tgz(r-4.6-any)wordpiece_2.1.3.tgz(r-4.5-any)
wordpiece_2.1.3.tar.gz(r-4.7-any)wordpiece_2.1.3.tar.gz(r-4.6-any)
wordpiece_2.1.3.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
card.svg |card.png
wordpiece/json (API)
NEWS
| # Install 'wordpiece' in R: |
| install.packages('wordpiece', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/macmillancontentscience/wordpiece/issues
Last updated from:3eb92c7595. Checks:7 NOTE, 2 OK. Indexed: no.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-x86_64 | NOTE | 122 | ||
| source / vignettes | OK | 181 | ||
| linux-release-x86_64 | NOTE | 131 | ||
| macos-release-arm64 | NOTE | 140 | ||
| macos-oldrel-arm64 | NOTE | 81 | ||
| windows-devel | NOTE | 71 | ||
| windows-release | NOTE | 75 | ||
| windows-oldrel | NOTE | 88 | ||
| wasm-release | OK | 117 |
Exports:load_or_retrieve_vocabload_vocabprepare_vocabset_wordpiece_cache_dirwordpiece_cache_dirwordpiece_tokenizewordpiece_vocab
Dependencies:cachemclidigestdlrfastmapfastmatchfsgluelifecyclemagrittrmemoisepiecemakerrappdirsrlangstringistringrvctrswordpiece.data
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| Load a vocabulary file, or retrieve from cache | load_or_retrieve_vocab |
| Load a vocabulary file | load_vocab |
| Format a Token List as a Vocabulary | prepare_vocab |
| Set a Cache Directory for wordpiece | set_wordpiece_cache_dir |
| Retrieve Directory for wordpiece Cache | wordpiece_cache_dir |
| Tokenize Sequence with Word Pieces | wordpiece_tokenize |
