Package: piecemaker Title: Tools for Preparing Text for Tokenizers Version: 1.0.2.9000 Authors@R: c( person("Jon", "Harmon", , "jonthegeek@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-4781-4346")), person("Jonathan", "Bratt", , "jonathan.bratt@macmillan.com", role = "aut", comment = c(ORCID = "0000-0003-2859-0076")), person("Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning", role = "cph") ) Description: Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer. License: Apache License (>= 2) URL: https://github.com/macmillancontentscience/piecemaker, https://macmillancontentscience.github.io/piecemaker/ BugReports: https://github.com/macmillancontentscience/piecemaker/issues Depends: R (>= 2.10) Imports: cli, glue, rlang (>= 0.4.2), stringi, stringr Suggests: covr, testthat (>= 3.0.0) Config/testthat/edition: 3 Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.2.3 Config/pak/sysreqs: libicu-dev Repository: https://jonthegeek.r-universe.dev Date/Publication: 2023-06-02 19:46:08 UTC RemoteUrl: https://github.com/macmillancontentscience/piecemaker RemoteRef: HEAD RemoteSha: b02c1a74923301545366805680e54091675305c6 NeedsCompilation: no Packaged: 2026-05-26 09:10:15 UTC; root Author: Jon Harmon [aut, cre] (ORCID: ), Jonathan Bratt [aut] (ORCID: ), Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph] Maintainer: Jon Harmon