NLP Architect integrated the Transformer models available in pytorch-transformers. Using Transformer models based on a pre-trained models usually done by attaching a classification head on the transformer model and fine-tuning the model (transformer and classifier) on the target (down-stream) task.

Base model

TransformerBase is a base class for handling loading, saving, training and inference of transformer models.

The base model support pytorch-transformers configs, tokenizers and base models as documented in their website (see our base-class for supported models).

In order to use the Transformer models just sub-class the base model and include:

  • A classifier (head) for your task.
  • sub-method handling of input to tensors used by model.
  • any sub-method to evaluate the task, do inference, etc.


Available transformer family models in NLP Architect:

Sequence classification Y Y Y Y Y
Token classification Y Y   Y Y

Sequence classification

TransformerSequenceClassifier is a transformer model with sentence classification head (the [CLS] token is used as classification label) for sentence classification tasks (classification/regression).

See nlp_architect.procedures.transformers.glue for an example of training sequence classification models on GLUE benchmark tasks.

Training a model on GLUE tasks, using BERT-base uncased base model:

nlp-train transformer_glue \
    --task_name <task name> \
    --model_name_or_path bert-base-uncased \
    --model_type bert \
    --output_dir <output dir> \
    --evaluate_during_training \
    --data_dir </path/to/glue_task> \

Running a model:

nlp-inference run transformer_glue \
    --model_path <path to model> \
    --task_name <task_name> \
    --model_type bert \
    --output_dir <output dir> \
    --data_dir <path to data> \
    --do_lower_case \

To run evaluation on the task’s development set add the flag --evaluate to the command line.

Token classification

TransformerTokenClassifier is a transformer model for token classification for tasks such as NER, POS or chunking.

See example for usage TransformerTokenClassifier NER model description.