# -------------------------------------------- # CITATION file created with {cffr} R package # See also: https://docs.ropensci.org/cffr/ # -------------------------------------------- cff-version: 1.2.0 message: 'To cite package "wordpiece.data" in publications use:' type: software license: Apache-2.0 title: 'wordpiece.data: Data for Wordpiece-Style Tokenization' version: 2.0.0 doi: 10.32614/CRAN.package.wordpiece.data abstract: Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from and and parsed into an R-friendly format. authors: - family-names: Bratt given-names: Jonathan email: jonathan.bratt@macmillan.com orcid: https://orcid.org/0000-0003-2859-0076 - family-names: Harmon given-names: Jon email: jonthegeek@gmail.com orcid: https://orcid.org/0000-0003-4781-4346 repository: https://jonthegeek.r-universe.dev repository-code: https://github.com/macmillancontentscience/wordpiece.data commit: f893df5061be8f53fd586b142274b7ed669112c9 url: https://github.com/macmillancontentscience/wordpiece.data contact: - family-names: Harmon given-names: Jon email: jonthegeek@gmail.com orcid: https://orcid.org/0000-0003-4781-4346