nlp_architect.models.transformers package
Submodules
nlp_architect.models.transformers.base_model module
-
class
nlp_architect.models.transformers.base_model.
InputFeatures
(input_ids, input_mask, segment_ids, label_id=None, valid_ids=None)[source] Bases:
object
A single set of features of data.
-
class
nlp_architect.models.transformers.base_model.
TransformerBase
(model_type: str, model_name_or_path: str, labels: List[str] = None, num_labels: int = None, config_name=None, tokenizer_name=None, do_lower_case=False, output_path=None, device='cpu', n_gpus=0)[source] Bases:
nlp_architect.models.TrainableModel
Transformers base model (for working with pytorch-transformers models)
-
MODEL_CONFIGURATIONS
= {'bert': (<class 'transformers.configuration_bert.BertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'quant_bert': (<class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'roberta': (<class 'transformers.configuration_roberta.RobertaConfig'>, <class 'transformers.tokenization_roberta.RobertaTokenizer'>), 'xlm': (<class 'transformers.configuration_xlm.XLMConfig'>, <class 'transformers.tokenization_xlm.XLMTokenizer'>), 'xlnet': (<class 'transformers.configuration_xlnet.XLNetConfig'>, <class 'transformers.tokenization_xlnet.XLNetTokenizer'>)}
-
static
get_train_steps_epochs
(max_steps: int, num_train_epochs: int, gradient_accumulation_steps: int, num_samples: int)[source] get train steps and epochs
Parameters: - max_steps (int) – max steps
- num_train_epochs (int) – num epochs
- gradient_accumulation_steps (int) – gradient accumulation steps
- num_samples (int) – number of samples
Returns: total steps, number of epochs
Return type: Tuple
-
classmethod
load_model
(model_path: str, model_type: str, *args, **kwargs)[source] Create a TranformerBase deom from given path
Parameters: - model_path (str) – path to model
- model_type (str) – model type
Returns: model
Return type:
-
optimizer
-
save_model
(output_dir: str, save_checkpoint: bool = False, args=None)[source] Save model/tokenizer/arguments to given output directory
Parameters: - output_dir (str) – path to output directory
- save_checkpoint (bool, optional) – save as checkpoint. Defaults to False.
- args ([type], optional) – arguments object to save. Defaults to None.
-
save_model_checkpoint
(output_path: str, name: str)[source] save model checkpoint
Parameters: - output_path (str) – output path
- name (str) – name of checkpoint
-
scheduler
-
nlp_architect.models.transformers.quantized_bert module
Quantized BERT layers and model
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertAttention
(config)[source] Bases:
transformers.modeling_bert.BertAttention
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertConfig
(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, **kwargs)[source] Bases:
transformers.configuration_bert.BertConfig
-
pretrained_config_archive_map
= {'bert-base-uncased': 'https://d2zs9tzlek599f.cloudfront.net/models/transformers/bert-base-uncased.json', 'bert-large-uncased': 'https://d2zs9tzlek599f.cloudfront.net/models/transformers/bert-large-uncased.json'}
-
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertEmbeddings
(config)[source] Bases:
transformers.modeling_bert.BertEmbeddings
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertEncoder
(config)[source] Bases:
transformers.modeling_bert.BertEncoder
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertForQuestionAnswering
(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel
,transformers.modeling_bert.BertForQuestionAnswering
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertForSequenceClassification
(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel
,transformers.modeling_bert.BertForSequenceClassification
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertForTokenClassification
(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel
,transformers.modeling_bert.BertForTokenClassification
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertIntermediate
(config)[source] Bases:
transformers.modeling_bert.BertIntermediate
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertLayer
(config)[source] Bases:
transformers.modeling_bert.BertLayer
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertModel
(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel
,transformers.modeling_bert.BertModel
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertOutput
(config)[source] Bases:
transformers.modeling_bert.BertOutput
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertPooler
(config)[source] Bases:
transformers.modeling_bert.BertPooler
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertPreTrainedModel
(config, *inputs, **kwargs)[source] Bases:
transformers.modeling_bert.BertPreTrainedModel
-
base_model_prefix
= 'quant_bert'
-
config_class
alias of
QuantizedBertConfig
-
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertSelfAttention
(config)[source] Bases:
transformers.modeling_bert.BertSelfAttention
-
class
nlp_architect.models.transformers.quantized_bert.
QuantizedBertSelfOutput
(config)[source] Bases:
transformers.modeling_bert.BertSelfOutput
nlp_architect.models.transformers.sequence_classification module
-
class
nlp_architect.models.transformers.sequence_classification.
TransformerSequenceClassifier
(model_type: str, labels: List[str] = None, task_type='classification', metric_fn=<function accuracy>, load_quantized=False, *args, **kwargs)[source] Bases:
nlp_architect.models.transformers.base_model.TransformerBase
Transformer sequence classifier
Parameters: - model_type (str) – transformer base model type
- labels (List[str], optional) – list of labels. Defaults to None.
- task_type (str, optional) – task type (classification/regression). Defaults to
- classification. –
- metric_fn ([type], optional) – metric to use for evaluation. Defaults to
- simple_accuracy. –
-
MODEL_CLASS
= {'bert': <class 'transformers.modeling_bert.BertForSequenceClassification'>, 'quant_bert': <class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertForSequenceClassification'>, 'roberta': <class 'transformers.modeling_roberta.RobertaForSequenceClassification'>, 'xlm': <class 'transformers.modeling_xlm.XLMForSequenceClassification'>, 'xlnet': <class 'transformers.modeling_xlnet.XLNetForSequenceClassification'>}
-
convert_to_tensors
(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source] Convert examples to tensor dataset
Parameters: - examples (List[SequenceClsInputExample]) – examples
- max_seq_length (int, optional) – max sequence length. Defaults to 128.
- include_labels (bool, optional) – include labels. Defaults to True.
Returns: Return type: TensorDataset
-
evaluate_predictions
(logits, label_ids)[source] Run evaluation of given logits and truth labels
Parameters: - logits – model logits
- label_ids – truth label ids
-
inference
(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int, batch_size: int = 64, evaluate=False)[source] Run inference on given examples
Parameters: - examples (List[SequenceClsInputExample]) – examples
- batch_size (int, optional) – batch size. Defaults to 64.
Returns: logits
-
train
(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100)[source] Train a model
Parameters: - train_data_set (DataLoader) – training data set
- dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development set.
- to None. (Defaults) –
- test_data_set (Union[DataLoader, List[DataLoader]], optional) – test set.
- to None. –
- gradient_accumulation_steps (int, optional) – num of gradient accumulation steps.
- to 1. (Defaults) –
- per_gpu_train_batch_size (int, optional) – per GPU train batch size. Defaults to 8.
- max_steps (int, optional) – max steps. Defaults to -1.
- num_train_epochs (int, optional) – number of train epochs. Defaults to 3.
- max_grad_norm (float, optional) – max gradient normalization. Defaults to 1.0.
- logging_steps (int, optional) – number of steps between logging. Defaults to 50.
- save_steps (int, optional) – number of steps between model save. Defaults to 100.
nlp_architect.models.transformers.token_classification module
-
class
nlp_architect.models.transformers.token_classification.
BertTokenClassificationHead
(config)[source] Bases:
transformers.modeling_bert.BertForTokenClassification
BERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
-
forward
(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source] The
BertForTokenClassification
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.Parameters: - input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.encode_plus()
for details. - attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens. - token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0
corresponds to a sentence A token,1
corresponds to a sentence B token - position_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]
. - head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked. - inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. - encoder_hidden_states (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. - encoder_attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens. - labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Labels for computing the token classification loss. Indices should be in[0, ..., config.num_labels - 1]
.
Returns: - loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) : Classification loss.
- scores (
torch.FloatTensor
of shape(batch_size, sequence_length, config.num_labels)
) Classification scores (before SoftMax).
- hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenconfig.output_hidden_states=True
): Tuple of
torch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenconfig.output_attentions=True
): Tuple of
torch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Return type: tuple(torch.FloatTensor)
comprising various elements depending on the configuration (BertConfig
) and inputsExamples:
from transformers import BertTokenizer, BertForTokenClassification import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForTokenClassification.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1 outputs = model(input_ids, labels=labels) loss, scores = outputs[:2]
- input_ids (
-
-
class
nlp_architect.models.transformers.token_classification.
QuantizedBertForTokenClassificationHead
(config)[source] Bases:
nlp_architect.models.transformers.quantized_bert.QuantizedBertForTokenClassification
Quantized BERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
-
forward
(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source] The
BertForTokenClassification
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.Parameters: - input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.encode_plus()
for details. - attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens. - token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0
corresponds to a sentence A token,1
corresponds to a sentence B token - position_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]
. - head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked. - inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix. - encoder_hidden_states (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. - encoder_attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens. - labels (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Labels for computing the token classification loss. Indices should be in[0, ..., config.num_labels - 1]
.
Returns: - loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) : Classification loss.
- scores (
torch.FloatTensor
of shape(batch_size, sequence_length, config.num_labels)
) Classification scores (before SoftMax).
- hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenconfig.output_hidden_states=True
): Tuple of
torch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
- attentions (
tuple(torch.FloatTensor)
, optional, returned whenconfig.output_attentions=True
): Tuple of
torch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Return type: tuple(torch.FloatTensor)
comprising various elements depending on the configuration (BertConfig
) and inputsExamples:
from transformers import BertTokenizer, BertForTokenClassification import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForTokenClassification.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1 outputs = model(input_ids, labels=labels) loss, scores = outputs[:2]
- input_ids (
-
-
class
nlp_architect.models.transformers.token_classification.
RobertaForTokenClassificationHead
(config)[source] Bases:
transformers.modeling_bert.BertPreTrainedModel
RoBERTa token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
-
base_model_prefix
= 'roberta'
-
config_class
alias of
transformers.configuration_roberta.RobertaConfig
-
forward
(input_ids, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, labels=None, valid_ids=None)[source] Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
pretrained_model_archive_map
= {'distilroberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-pytorch_model.bin', 'roberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-pytorch_model.bin', 'roberta-base-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-openai-detector-pytorch_model.bin', 'roberta-large': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-pytorch_model.bin', 'roberta-large-mnli': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-mnli-pytorch_model.bin', 'roberta-large-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-openai-detector-pytorch_model.bin'}
-
-
class
nlp_architect.models.transformers.token_classification.
TransformerTokenClassifier
(model_type: str, labels: List[str] = None, training_args: bool = None, *args, load_quantized=False, **kwargs)[source] Bases:
nlp_architect.models.transformers.base_model.TransformerBase
Transformer word tagging classifier :param model_type: model family (classifier head), choose between bert/quant_bert/xlnet :type model_type: str :param labels: list of tag labels :type labels: List[str], optional
-
MODEL_CLASS
= {'bert': <class 'nlp_architect.models.transformers.token_classification.BertTokenClassificationHead'>, 'quant_bert': <class 'nlp_architect.models.transformers.token_classification.QuantizedBertForTokenClassificationHead'>, 'roberta': <class 'nlp_architect.models.transformers.token_classification.RobertaForTokenClassificationHead'>, 'xlnet': <class 'nlp_architect.models.transformers.token_classification.XLNetTokenClassificationHead'>}
-
convert_to_tensors
(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source] Convert examples to tensor dataset
Parameters: - examples (List[SequenceClsInputExample]) – examples
- max_seq_length (int, optional) – max sequence length. Defaults to 128.
- include_labels (bool, optional) – include labels. Defaults to True.
Returns: Return type: TensorDataset
-
evaluate_predictions
(logits, label_ids)[source] Run evaluation of given logist and truth labels
Parameters: - logits – model logits
- label_ids – truth label ids
-
inference
(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int, batch_size: int = 64)[source] Run inference on given examples
Parameters: - examples (List[SequenceClsInputExample]) – examples
- batch_size (int, optional) – batch size. Defaults to 64.
Returns: logits
-
train
(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100, best_result_file: str = None)[source] Run model training
Parameters: - train_data_set (DataLoader) – training dataset
- dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development data set
- be list) Defaults to None. ((can) –
- test_data_set (Union[DataLoader, List[DataLoader]], optional) – test data set
- be list) Defaults to None. –
- gradient_accumulation_steps (int, optional) – gradient accumulation steps.
- to 1. (Defaults) –
- per_gpu_train_batch_size (int, optional) – per GPU train batch size (or GPU).
- to 8. (Defaults) –
- max_steps (int, optional) – max steps for training. Defaults to -1.
- num_train_epochs (int, optional) – number of training epochs. Defaults to 3.
- max_grad_norm (float, optional) – max gradient norm. Defaults to 1.0.
- logging_steps (int, optional) – number of steps between logging. Defaults to 50.
- save_steps (int, optional) – number of steps between model save. Defaults to 100.
- best_result_file (str, optional) – path to save best dev results when it’s updated.
-
-
class
nlp_architect.models.transformers.token_classification.
XLNetTokenClassificationHead
(config)[source] Bases:
transformers.modeling_xlnet.XLNetPreTrainedModel
XLNet token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.
The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).
-
forward
(input_ids, token_type_ids=None, input_mask=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, labels=None, head_mask=None, valid_ids=None)[source] Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-