Transformers
NLP Architect integrated the Transformer models available in pytorch-transformers. Using Transformer models based on a pre-trained models usually done by attaching a classification head on the transformer model and fine-tuning the model (transformer and classifier) on the target (down-stream) task.
Base model
TransformerBase
is a base class for handling
loading, saving, training and inference of transformer models.
The base model support pytorch-transformers configs, tokenizers and base models as documented in their website (see our base-class for supported models).
In order to use the Transformer models just sub-class the base model and include:
- A classifier (head) for your task.
- sub-method handling of input to tensors used by model.
- any sub-method to evaluate the task, do inference, etc.
Models
Available transformer family models in NLP Architect:
BERT | Quantized BERT | XLM | XLNet | RoBERTa | |
---|---|---|---|---|---|
Sequence classification | Y | Y | Y | Y | Y |
Token classification | Y | Y | Y | Y |
Sequence classification
TransformerSequenceClassifier
is a transformer model with sentence classification head (the [CLS]
token is used as classification label) for sentence classification tasks (classification/regression).
See nlp_architect.procedures.transformers.glue
for an example of training sequence classification models on GLUE benchmark tasks.
Training a model on GLUE tasks, using BERT-base uncased base model:
nlp-train transformer_glue \
--task_name <task name> \
--model_name_or_path bert-base-uncased \
--model_type bert \
--output_dir <output dir> \
--evaluate_during_training \
--data_dir </path/to/glue_task> \
--do_lower_case
Running a model:
nlp-inference run transformer_glue \
--model_path <path to model> \
--task_name <task_name> \
--model_type bert \
--output_dir <output dir> \
--data_dir <path to data> \
--do_lower_case \
--overwrite_output_dir
To run evaluation on the task’s development set add the flag --evaluate
to the command line.
Token classification
TransformerTokenClassifier
is a transformer model for token classification for tasks such as NER, POS or chunking.
See example for usage TransformerTokenClassifier NER model description.