Package: piecemaker 1.0.2.9000

piecemaker: Tools for Preparing Text for Tokenizers

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.

Authors:Jon Harmon [aut, cre], Jonathan Bratt [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]

piecemaker_1.0.2.9000.tar.gz
piecemaker_1.0.2.9000.zip(r-4.5)piecemaker_1.0.2.9000.zip(r-4.4)piecemaker_1.0.2.9000.zip(r-4.3)
piecemaker_1.0.2.9000.tgz(r-4.5-any)piecemaker_1.0.2.9000.tgz(r-4.4-any)piecemaker_1.0.2.9000.tgz(r-4.3-any)
piecemaker_1.0.2.9000.tar.gz(r-4.5-noble)piecemaker_1.0.2.9000.tar.gz(r-4.4-noble)
piecemaker_1.0.2.9000.tgz(r-4.4-emscripten)piecemaker_1.0.2.9000.tgz(r-4.3-emscripten)
piecemaker.pdf |piecemaker.html
piecemaker/json (API)
NEWS

# Install 'piecemaker' in R:
install.packages('piecemaker', repos = c('https://jonthegeek.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/macmillancontentscience/piecemaker/issues

Pkgdown site:https://macmillancontentscience.github.io

On CRAN:

3.48 score 2 packages 6 scripts 255 downloads 10 exports 8 dependencies

Last updated 2 years agofrom:b02c1a7492. Checks:8 OK. Indexed: no.

TargetResultLatest binary
Doc / VignettesOKFeb 05 2025
R-4.5-winOKFeb 05 2025
R-4.5-macOKFeb 05 2025
R-4.5-linuxOKFeb 05 2025
R-4.4-winOKFeb 05 2025
R-4.4-macOKFeb 05 2025
R-4.3-winOKFeb 05 2025
R-4.3-macOKFeb 05 2025

Exports:prepare_and_tokenizeprepare_textremove_control_charactersremove_diacriticsremove_replacement_charactersspace_cjkspace_punctuationsquish_whitespacetokenize_spacevalidate_utf8

Dependencies:cligluelifecyclemagrittrrlangstringistringrvctrs