Title: | Data for Morpheme Tokenization |
---|---|
Description: | Provides data about morphemes, the smallest units of meaning in a language. |
Authors: | Jonathan Bratt [aut] , Jon Harmon [aut, cre] , Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph] |
Maintainer: | Jon Harmon <[email protected]> |
License: | Apache License (>= 2) |
Version: | 1.2.0 |
Built: | 2024-10-31 20:27:37 UTC |
Source: | https://github.com/macmillancontentscience/morphemepiece.data |
A morphemepiece lookup is a named character vector. The names of the vector are the words, and the values are the space-separated morpheme breakdowns of those words.
morphemepiece_lookup()
morphemepiece_lookup()
A named character vector.
head(morphemepiece_lookup())
head(morphemepiece_lookup())
A morphemepiece vocabulary is a named integer vector with class "morphemepiece_vocabulary". The names of the vector are the morphemes, and the values are the integer identifiers of those tokens. The vocabulary is 0-indexed for compatibility with Python implementations.
morphemepiece_vocab()
morphemepiece_vocab()
A morphemepiece_vocabulary.
head(morphemepiece_vocab())
head(morphemepiece_vocab())