Title: | Tidy BERT-like Models |
---|---|
Description: | Implements BERT-like NLP models with a consistent interface for fitting and creating predictions. The models are fully compatible with the tidymodels framework. |
Authors: | Jonathan Bratt [aut] , Jon Harmon [aut, cre] , Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph] |
Maintainer: | Jon Harmon <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.0.0.9900 |
Built: | 2024-11-20 04:36:32 UTC |
Source: | https://github.com/macmillancontentscience/tidybert |
bert()
defines a model that fine-tunes a pre-trained BERT-like model to a
classification or regression task.
bert( mode = "unknown", engine = "tidybert", epochs = 10, batch_size = 128, bert_type = "bert_small_uncased", n_tokens = 1 )
bert( mode = "unknown", engine = "tidybert", epochs = 10, batch_size = 128, bert_type = "bert_small_uncased", n_tokens = 1 )
mode |
A single character string for the prediction outcome mode. Possible values for this model are "unknown", "regression", or "classification". |
engine |
A single character string specifying what computational engine to use for fitting. The only implemented option is "tidybert". |
epochs |
A single integer indicating the maximum number of epochs for training, or a vector of two integers, indicating the minimum and maximum number of epochs for training. |
batch_size |
The number of samples to load in each batch during training. |
bert_type |
Character; which flavor of BERT to use. See
|
n_tokens |
An integer scalar indicating the number of tokens in the output. |
This package (tidybert
) is currently the only engine for this model. See
tidybert_engine for parameters available in this engine.
The defined model is appropriate for use with parsnip
and the rest of the
tidymodels
framework.
A specification for a model.
bert_classification()
fits a classifier neural network in the style of
BERT from Google Research.
bert_classification(x, ...) ## Default S3 method: bert_classification(x, ...) ## S3 method for class 'data.frame' bert_classification( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_cross_entropy_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_accuracy()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'matrix' bert_classification( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_cross_entropy_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_accuracy()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'formula' bert_classification( formula, data, valid_data = 0.1, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_cross_entropy_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_accuracy()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... )
bert_classification(x, ...) ## Default S3 method: bert_classification(x, ...) ## S3 method for class 'data.frame' bert_classification( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_cross_entropy_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_accuracy()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'matrix' bert_classification( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_cross_entropy_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_accuracy()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'formula' bert_classification( formula, data, valid_data = 0.1, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_cross_entropy_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_accuracy()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... )
x |
Depending on the context:
|
... |
Additional parameters to pass to methods or to luz for fitting. |
y |
When
|
valid_x |
Depending on the context:
|
valid_y |
When |
bert_type |
Character; which flavor of BERT to use. See
|
n_tokens |
An integer scalar indicating the number of tokens in the output. |
loss |
( |
optimizer |
( |
metrics |
( |
epochs |
(int) The maximum number of epochs for training the model. If a
single value is provided, this is taken to be the |
batch_size |
(int, optional): how many samples per batch to load
(default: |
luz_opt_hparams |
List; parameters to pass on to
|
formula |
A formula specifying the outcome term on the left-hand side, and the predictor terms on the right-hand side. |
data |
When a formula is used,
|
valid_data |
When a formula is used,
|
The generated model is a pretrained BERT model with a final dense linear
layer to map the output to the outcome levels, constructed using
model_bert_linear()
. That pretrained model is fine-tuned on the provided
training data. Input data (during both fitting and prediction) is
automatically tokenized to match the tokenization expected by the BERT model.
A bert_classification
object.
bert_regression()
fits a regression neural network in the style of
BERT from Google Research.
bert_regression(x, ...) ## Default S3 method: bert_regression(x, ...) ## S3 method for class 'data.frame' bert_regression( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_rmse()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'matrix' bert_regression( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_rmse()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'formula' bert_regression( formula, data, valid_data = 0.1, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_rmse()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... )
bert_regression(x, ...) ## Default S3 method: bert_regression(x, ...) ## S3 method for class 'data.frame' bert_regression( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_rmse()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'matrix' bert_regression( x, y, valid_x = 0.1, valid_y = NULL, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_rmse()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... ) ## S3 method for class 'formula' bert_regression( formula, data, valid_data = 0.1, bert_type = "bert_tiny_uncased", n_tokens = torchtransformers::config_bert(bert_type, "max_tokens"), loss = torch::nn_mse_loss(), optimizer = torch::optim_adam, metrics = list(luz::luz_metric_rmse()), epochs = 10, batch_size = 128, luz_opt_hparams = list(), ... )
x |
Depending on the context:
|
... |
Additional parameters to pass to methods or to luz for fitting. |
y |
When
|
valid_x |
Depending on the context:
|
valid_y |
When |
bert_type |
Character; which flavor of BERT to use. See
|
n_tokens |
An integer scalar indicating the number of tokens in the output. |
loss |
( |
optimizer |
( |
metrics |
( |
epochs |
(int) The maximum number of epochs for training the model. If a
single value is provided, this is taken to be the |
batch_size |
(int, optional): how many samples per batch to load
(default: |
luz_opt_hparams |
List; parameters to pass on to
|
formula |
A formula specifying the outcome term on the left-hand side, and the predictor terms on the right-hand side. |
data |
When a formula is used,
|
valid_data |
When a formula is used,
|
The generated model is a pretrained BERT model with a final dense linear
layer to map the output to a numerical value, constructed using
model_bert_linear()
. That pretrained model is fine-tuned on the provided
training data. Input data (during both fitting and prediction) is
automatically tokenized to match the tokenization expected by the BERT model.
A bert_regression
object.
The pre-trained BERT model that will be fine-tuned for a model.
bert_type( values = c("bert_tiny_uncased", "bert_mini_uncased", "bert_small_uncased", "bert_medium_uncased", "bert_base_uncased", "bert_base_cased", "bert_large_uncased") )
bert_type( values = c("bert_tiny_uncased", "bert_mini_uncased", "bert_small_uncased", "bert_medium_uncased", "bert_base_uncased", "bert_base_cased", "bert_large_uncased") )
values |
A character vector indicating the names of available models.
The default uses the 7 named pre-trained BERT models. We recommend that you
select specific models that are likely to work on your hardware. See
|
A parameter that can be tuned with the tune
package.
if (rlang::is_installed("dials")) { bert_type() }
if (rlang::is_installed("dials")) { bert_type() }
Construct a BERT model with pretrained weights, and add a final dense linear layer to transform to a desired number of dimensions. Note that we only use the CLS token output from the final layer of the BERT model. It is possible to attach a classification or regression head to BERT using other techniques, but here we use this simple technique.
model_bert_linear(bert_type = "bert_tiny_uncased", output_dim = 1L)
model_bert_linear(bert_type = "bert_tiny_uncased", output_dim = 1L)
bert_type |
Character; which flavor of BERT to use. See
|
output_dim |
Integer; the target number of output dimensions. |
A torch neural net model with pretrained BERT weights and a final dense layer.
The number of tokens to use for tokenization of predictors.
n_tokens(range = c(1, 9), trans = scales::log2_trans())
n_tokens(range = c(1, 9), trans = scales::log2_trans())
range |
A two-element integer vector with the smallest and largest possible values. By default these values should be the powers of two to try. |
trans |
An optional transformation to apply. By default,
|
A parameter that can be tuned with the tune
package.
if (rlang::is_installed("dials")) { n_tokens() }
if (rlang::is_installed("dials")) { n_tokens() }
bert_classification
model.Predict from a bert_classification
model.
## S3 method for class 'bert_classification' predict(object, new_data, type = c("class", "prob"), ...)
## S3 method for class 'bert_classification' predict(object, new_data, type = c("class", "prob"), ...)
object |
A |
new_data |
A data frame or matrix of new character predictors. This data is automatically tokenized to match the tokenization expected by the BERT model. |
type |
A single character. The type of predictions to generate. Valid options are:
|
... |
Not used, but required for extensibility. |
A tibble of predictions. The number of rows in the tibble is guaranteed to be
the same as the number of rows in new_data
.
bert_regression
model.Predict from a bert_regression
model.
## S3 method for class 'bert_regression' predict(object, new_data, ...)
## S3 method for class 'bert_regression' predict(object, new_data, ...)
object |
A |
new_data |
A data frame or matrix of new character predictors. This data is automatically tokenized to match the tokenization expected by the BERT model. |
... |
Not used, but required for extensibility. |
A tibble of predictions. The number of rows in the tibble is guaranteed to be
the same as the number of rows in new_data
.
Given the output from a transformer model, construct tidy data frames for the layer outputs and the attention weights.
tidy_bert_output(bert_model_output, tokenized)
tidy_bert_output(bert_model_output, tokenized)
bert_model_output |
The output from a BERT model. |
tokenized |
The raw output from |
A list of data frames, one for the layer output embeddings and one for the attention weights.