Package: morphemepiece 1.2.3
morphemepiece: Morpheme Tokenization
Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.
Authors:
morphemepiece_1.2.3.tar.gz
morphemepiece_1.2.3.zip(r-4.5)morphemepiece_1.2.3.zip(r-4.4)morphemepiece_1.2.3.zip(r-4.3)
morphemepiece_1.2.3.tgz(r-4.4-any)morphemepiece_1.2.3.tgz(r-4.3-any)
morphemepiece_1.2.3.tar.gz(r-4.5-noble)morphemepiece_1.2.3.tar.gz(r-4.4-noble)
morphemepiece_1.2.3.tgz(r-4.4-emscripten)morphemepiece_1.2.3.tgz(r-4.3-emscripten)
morphemepiece.pdf |morphemepiece.html✨
morphemepiece/json (API)
NEWS
# Install 'morphemepiece' in R: |
install.packages('morphemepiece', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/macmillancontentscience/morphemepiece/issues
Last updated 3 years agofrom:bc071b1a03. Checks:OK: 3 NOTE: 4. Indexed: no.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Nov 19 2024 |
R-4.5-win | NOTE | Nov 19 2024 |
R-4.5-linux | NOTE | Nov 19 2024 |
R-4.4-win | NOTE | Nov 19 2024 |
R-4.4-mac | NOTE | Nov 19 2024 |
R-4.3-win | OK | Nov 19 2024 |
R-4.3-mac | OK | Nov 19 2024 |
Exports:load_lookupload_or_retrieve_lookupload_or_retrieve_vocabload_vocabmorphemepiece_cache_dirmorphemepiece_lookupmorphemepiece_tokenizemorphemepiece_vocabprepare_vocabset_morphemepiece_cache_dir
Dependencies:bitbit64cachemclicliprcpp11crayondigestdlrfansifastmapfastmatchfsgluehmslifecyclemagrittrmemoisemorphemepiece.datapiecemakerpillarpkgconfigprettyunitsprogresspurrrR6rappdirsreadrrlangstringistringrtibbletidyselecttzdbutf8vctrsvroomwithr
Readme and manuals
Help Manual
Help page | Topics |
---|---|
morphemepiece: Morpheme Tokenization | morphemepiece-package |
Load a morphemepiece lookup file | load_lookup |
Load a lookup file, or retrieve from cache | load_or_retrieve_lookup |
Load a vocabulary file, or retrieve from cache | load_or_retrieve_vocab |
Load a vocabulary file | load_vocab |
Retrieve Directory for Morphemepiece Cache | morphemepiece_cache_dir |
Tokenize Sequence with Morpheme Pieces | morphemepiece_tokenize |
Format a Token List as a Vocabulary | prepare_vocab |
Set a Cache Directory for Morphemepiece | set_morphemepiece_cache_dir |