nlp_architect.models.transformers package
Submodules
nlp_architect.models.transformers.base_model module
- 
class 
nlp_architect.models.transformers.base_model.InputFeatures(input_ids, input_mask, segment_ids, label_id=None, valid_ids=None)[source] Bases:
objectA single set of features of data.
- 
class 
nlp_architect.models.transformers.base_model.TransformerBase(model_type: str, model_name_or_path: str, labels: List[str] = None, num_labels: int = None, config_name=None, tokenizer_name=None, do_lower_case=False, output_path=None, device='cpu', n_gpus=0)[source] Bases:
nlp_architect.models.TrainableModelTransformers base model (for working with pytorch-transformers models)
- 
MODEL_CONFIGURATIONS= {'bert': (<class 'transformers.configuration_bert.BertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'quant_bert': (<class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'roberta': (<class 'transformers.configuration_roberta.RobertaConfig'>, <class 'transformers.tokenization_roberta.RobertaTokenizer'>), 'xlm': (<class 'transformers.configuration_xlm.XLMConfig'>, <class 'transformers.tokenization_xlm.XLMTokenizer'>), 'xlnet': (<class 'transformers.configuration_xlnet.XLNetConfig'>, <class 'transformers.tokenization_xlnet.XLNetTokenizer'>)} 
- 
static 
get_train_steps_epochs(max_steps: int, num_train_epochs: int, gradient_accumulation_steps: int, num_samples: int)[source] get train steps and epochs
Parameters: - max_steps (int) – max steps
 - num_train_epochs (int) – num epochs
 - gradient_accumulation_steps (int) – gradient accumulation steps
 - num_samples (int) – number of samples
 
Returns: total steps, number of epochs
Return type: Tuple
- 
classmethod 
load_model(model_path: str, model_type: str, *args, **kwargs)[source] Create a TranformerBase deom from given path
Parameters: - model_path (str) – path to model
 - model_type (str) – model type
 
Returns: model
Return type: 
- 
optimizer 
- 
save_model(output_dir: str, save_checkpoint: bool = False, args=None)[source] Save model/tokenizer/arguments to given output directory
Parameters: - output_dir (str) – path to output directory
 - save_checkpoint (bool, optional) – save as checkpoint. Defaults to False.
 - args ([type], optional) – arguments object to save. Defaults to None.
 
- 
save_model_checkpoint(output_path: str, name: str)[source] save model checkpoint
Parameters: - output_path (str) – output path
 - name (str) – name of checkpoint
 
- 
scheduler 
- 
 
nlp_architect.models.transformers.quantized_bert module
Quantized BERT layers and model
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertAttention(config)[source] Bases:
transformers.modeling_bert.BertAttention
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertConfig(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, **kwargs)[source] Bases:
transformers.configuration_bert.BertConfig- 
pretrained_config_archive_map= {'bert-base-uncased': 'https://d2zs9tzlek599f.cloudfront.net/models/transformers/bert-base-uncased.json', 'bert-large-uncased': 'https://d2zs9tzlek599f.cloudfront.net/models/transformers/bert-large-uncased.json'} 
- 
 
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertEmbeddings(config)[source] Bases:
transformers.modeling_bert.BertEmbeddings
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertEncoder(config)[source] Bases:
transformers.modeling_bert.BertEncoder
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertForQuestionAnswering(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel,transformers.modeling_bert.BertForQuestionAnswering
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertForSequenceClassification(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel,transformers.modeling_bert.BertForSequenceClassification
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertForTokenClassification(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel,transformers.modeling_bert.BertForTokenClassification
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertIntermediate(config)[source] Bases:
transformers.modeling_bert.BertIntermediate
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertLayer(config)[source] Bases:
transformers.modeling_bert.BertLayer
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertModel(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel,transformers.modeling_bert.BertModel
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertOutput(config)[source] Bases:
transformers.modeling_bert.BertOutput
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertPooler(config)[source] Bases:
transformers.modeling_bert.BertPooler
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel(config, *inputs, **kwargs)[source] Bases:
transformers.modeling_bert.BertPreTrainedModel- 
base_model_prefix= 'quant_bert' 
- 
config_class alias of
QuantizedBertConfig
- 
 
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertSelfAttention(config)[source] Bases:
transformers.modeling_bert.BertSelfAttention
- 
class 
nlp_architect.models.transformers.quantized_bert.QuantizedBertSelfOutput(config)[source] Bases:
transformers.modeling_bert.BertSelfOutput
nlp_architect.models.transformers.sequence_classification module
- 
class 
nlp_architect.models.transformers.sequence_classification.TransformerSequenceClassifier(model_type: str, labels: List[str] = None, task_type='classification', metric_fn=<function accuracy>, load_quantized=False, *args, **kwargs)[source] Bases:
nlp_architect.models.transformers.base_model.TransformerBaseTransformer sequence classifier
Parameters: - model_type (str) – transformer base model type
 - labels (List[str], optional) – list of labels. Defaults to None.
 - task_type (str, optional) – task type (classification/regression). Defaults to
 - classification. –
 - metric_fn ([type], optional) – metric to use for evaluation. Defaults to
 - simple_accuracy. –
 
- 
MODEL_CLASS= {'bert': <class 'transformers.modeling_bert.BertForSequenceClassification'>, 'quant_bert': <class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertForSequenceClassification'>, 'roberta': <class 'transformers.modeling_roberta.RobertaForSequenceClassification'>, 'xlm': <class 'transformers.modeling_xlm.XLMForSequenceClassification'>, 'xlnet': <class 'transformers.modeling_xlnet.XLNetForSequenceClassification'>} 
- 
convert_to_tensors(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source] Convert examples to tensor dataset
Parameters: - examples (List[SequenceClsInputExample]) – examples
 - max_seq_length (int, optional) – max sequence length. Defaults to 128.
 - include_labels (bool, optional) – include labels. Defaults to True.
 
Returns: Return type: TensorDataset
- 
evaluate_predictions(logits, label_ids)[source] Run evaluation of given logits and truth labels
Parameters: - logits – model logits
 - label_ids – truth label ids
 
- 
inference(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int, batch_size: int = 64, evaluate=False)[source] Run inference on given examples
Parameters: - examples (List[SequenceClsInputExample]) – examples
 - batch_size (int, optional) – batch size. Defaults to 64.
 
Returns: logits
- 
train(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100)[source] Train a model
Parameters: - train_data_set (DataLoader) – training data set
 - dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development set.
 - to None. (Defaults) –
 - test_data_set (Union[DataLoader, List[DataLoader]], optional) – test set.
 - to None. –
 - gradient_accumulation_steps (int, optional) – num of gradient accumulation steps.
 - to 1. (Defaults) –
 - per_gpu_train_batch_size (int, optional) – per GPU train batch size. Defaults to 8.
 - max_steps (int, optional) – max steps. Defaults to -1.
 - num_train_epochs (int, optional) – number of train epochs. Defaults to 3.
 - max_grad_norm (float, optional) – max gradient normalization. Defaults to 1.0.
 - logging_steps (int, optional) – number of steps between logging. Defaults to 50.
 - save_steps (int, optional) – number of steps between model save. Defaults to 100.
 
nlp_architect.models.transformers.token_classification module
- 
class 
nlp_architect.models.transformers.token_classification.BertTokenClassificationHead(config)[source] Bases:
transformers.modeling_bert.BertForTokenClassificationBERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
- 
forward(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source] The
BertForTokenClassificationforward method, overrides the__call__()special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.Parameters: - input_ids (
torch.LongTensorof shape(batch_size, sequence_length)) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.BertTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.encode_plus()for details. - attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional, defaults toNone) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]:1for tokens that are NOT MASKED,0for MASKED tokens. - token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional, defaults toNone) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]:0corresponds to a sentence A token,1corresponds to a sentence B token - position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional, defaults toNone) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]. - head_mask (
torch.FloatTensorof shape(num_heads,)or(num_layers, num_heads), optional, defaults toNone) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]:1indicates the head is not masked,0indicates the head is masked. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional, defaults toNone) – Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. - encoder_hidden_states (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional, defaults toNone) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. - encoder_attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional, defaults toNone) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in[0, 1]:1for tokens that are NOT MASKED,0for MASKED tokens. - labels (
torch.LongTensorof shape(batch_size, sequence_length), optional, defaults toNone) – Labels for computing the token classification loss. Indices should be in[0, ..., config.num_labels - 1]. 
Returns: - loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelsis provided) : Classification loss.
- scores (
torch.FloatTensorof shape(batch_size, sequence_length, config.num_labels)) Classification scores (before SoftMax).
- hidden_states (
tuple(torch.FloatTensor), optional, returned whenconfig.output_hidden_states=True): Tuple of
torch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the initial embedding outputs.
- attentions (
tuple(torch.FloatTensor), optional, returned whenconfig.output_attentions=True): Tuple of
torch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Return type: tuple(torch.FloatTensor)comprising various elements depending on the configuration (BertConfig) and inputsExamples:
from transformers import BertTokenizer, BertForTokenClassification import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForTokenClassification.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1 outputs = model(input_ids, labels=labels) loss, scores = outputs[:2]
- input_ids (
 
- 
 
- 
class 
nlp_architect.models.transformers.token_classification.QuantizedBertForTokenClassificationHead(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertForTokenClassificationQuantized BERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
- 
forward(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source] The
BertForTokenClassificationforward method, overrides the__call__()special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.Parameters: - input_ids (
torch.LongTensorof shape(batch_size, sequence_length)) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.BertTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.encode_plus()for details. - attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional, defaults toNone) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]:1for tokens that are NOT MASKED,0for MASKED tokens. - token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional, defaults toNone) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]:0corresponds to a sentence A token,1corresponds to a sentence B token - position_ids (
torch.LongTensorof shape(batch_size, sequence_length), optional, defaults toNone) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]. - head_mask (
torch.FloatTensorof shape(num_heads,)or(num_layers, num_heads), optional, defaults toNone) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]:1indicates the head is not masked,0indicates the head is masked. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional, defaults toNone) – Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. - encoder_hidden_states (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional, defaults toNone) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. - encoder_attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional, defaults toNone) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in[0, 1]:1for tokens that are NOT MASKED,0for MASKED tokens. - labels (
torch.LongTensorof shape(batch_size, sequence_length), optional, defaults toNone) – Labels for computing the token classification loss. Indices should be in[0, ..., config.num_labels - 1]. 
Returns: - loss (
torch.FloatTensorof shape(1,), optional, returned whenlabelsis provided) : Classification loss.
- scores (
torch.FloatTensorof shape(batch_size, sequence_length, config.num_labels)) Classification scores (before SoftMax).
- hidden_states (
tuple(torch.FloatTensor), optional, returned whenconfig.output_hidden_states=True): Tuple of
torch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the initial embedding outputs.
- attentions (
tuple(torch.FloatTensor), optional, returned whenconfig.output_attentions=True): Tuple of
torch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Return type: tuple(torch.FloatTensor)comprising various elements depending on the configuration (BertConfig) and inputsExamples:
from transformers import BertTokenizer, BertForTokenClassification import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForTokenClassification.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1 outputs = model(input_ids, labels=labels) loss, scores = outputs[:2]
- input_ids (
 
- 
 
- 
class 
nlp_architect.models.transformers.token_classification.RobertaForTokenClassificationHead(config)[source] Bases:
transformers.modeling_bert.BertPreTrainedModelRoBERTa token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
- 
base_model_prefix= 'roberta' 
- 
config_class alias of
transformers.configuration_roberta.RobertaConfig
- 
forward(input_ids, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, labels=None, valid_ids=None)[source] Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- 
pretrained_model_archive_map= {'distilroberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-pytorch_model.bin', 'roberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-pytorch_model.bin', 'roberta-base-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-openai-detector-pytorch_model.bin', 'roberta-large': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-pytorch_model.bin', 'roberta-large-mnli': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-mnli-pytorch_model.bin', 'roberta-large-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-openai-detector-pytorch_model.bin'} 
- 
 
- 
class 
nlp_architect.models.transformers.token_classification.TransformerTokenClassifier(model_type: str, labels: List[str] = None, training_args: bool = None, *args, load_quantized=False, **kwargs)[source] Bases:
nlp_architect.models.transformers.base_model.TransformerBaseTransformer word tagging classifier :param model_type: model family (classifier head), choose between bert/quant_bert/xlnet :type model_type: str :param labels: list of tag labels :type labels: List[str], optional
- 
MODEL_CLASS= {'bert': <class 'nlp_architect.models.transformers.token_classification.BertTokenClassificationHead'>, 'quant_bert': <class 'nlp_architect.models.transformers.token_classification.QuantizedBertForTokenClassificationHead'>, 'roberta': <class 'nlp_architect.models.transformers.token_classification.RobertaForTokenClassificationHead'>, 'xlnet': <class 'nlp_architect.models.transformers.token_classification.XLNetTokenClassificationHead'>} 
- 
convert_to_tensors(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source] Convert examples to tensor dataset
Parameters: - examples (List[SequenceClsInputExample]) – examples
 - max_seq_length (int, optional) – max sequence length. Defaults to 128.
 - include_labels (bool, optional) – include labels. Defaults to True.
 
Returns: Return type: TensorDataset
- 
evaluate_predictions(logits, label_ids)[source] Run evaluation of given logist and truth labels
Parameters: - logits – model logits
 - label_ids – truth label ids
 
- 
inference(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int, batch_size: int = 64)[source] Run inference on given examples
Parameters: - examples (List[SequenceClsInputExample]) – examples
 - batch_size (int, optional) – batch size. Defaults to 64.
 
Returns: logits
- 
train(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100, best_result_file: str = None)[source] Run model training
Parameters: - train_data_set (DataLoader) – training dataset
 - dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development data set
 - be list) Defaults to None. ((can) –
 - test_data_set (Union[DataLoader, List[DataLoader]], optional) – test data set
 - be list) Defaults to None. –
 - gradient_accumulation_steps (int, optional) – gradient accumulation steps.
 - to 1. (Defaults) –
 - per_gpu_train_batch_size (int, optional) – per GPU train batch size (or GPU).
 - to 8. (Defaults) –
 - max_steps (int, optional) – max steps for training. Defaults to -1.
 - num_train_epochs (int, optional) – number of training epochs. Defaults to 3.
 - max_grad_norm (float, optional) – max gradient norm. Defaults to 1.0.
 - logging_steps (int, optional) – number of steps between logging. Defaults to 50.
 - save_steps (int, optional) – number of steps between model save. Defaults to 100.
 - best_result_file (str, optional) – path to save best dev results when it’s updated.
 
- 
 
- 
class 
nlp_architect.models.transformers.token_classification.XLNetTokenClassificationHead(config)[source] Bases:
transformers.modeling_xlnet.XLNetPreTrainedModelXLNet token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
- 
forward(input_ids, token_type_ids=None, input_mask=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, labels=None, head_mask=None, valid_ids=None)[source] Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-