nlp_architect.models.transformers package

Submodules

nlp_architect.models.transformers.base_model module

class nlp_architect.models.transformers.base_model.InputFeatures(input_ids, input_mask, segment_ids, label_id=None, valid_ids=None)[source]

Bases: object

A single set of features of data.

class nlp_architect.models.transformers.base_model.TransformerBase(model_type: str, model_name_or_path: str, labels: List[str] = None, num_labels: int = None, config_name=None, tokenizer_name=None, do_lower_case=False, output_path=None, device='cpu', n_gpus=0)[source]

Bases: nlp_architect.models.TrainableModel

Transformers base model (for working with pytorch-transformers models)

MODEL_CONFIGURATIONS = {'bert': (<class 'transformers.configuration_bert.BertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'quant_bert': (<class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'roberta': (<class 'transformers.configuration_roberta.RobertaConfig'>, <class 'transformers.tokenization_roberta.RobertaTokenizer'>), 'xlm': (<class 'transformers.configuration_xlm.XLMConfig'>, <class 'transformers.tokenization_xlm.XLMTokenizer'>), 'xlnet': (<class 'transformers.configuration_xlnet.XLNetConfig'>, <class 'transformers.tokenization_xlnet.XLNetTokenizer'>)}

evaluate_predictions(logits, label_ids)[source]

get_logits(batch)[source]: get model logits from given input

static get_train_steps_epochs(max_steps: int, num_train_epochs: int, gradient_accumulation_steps: int, num_samples: int)[source]

get train steps and epochs

Parameters:	max_steps (int) – max steps num_train_epochs (int) – num epochs gradient_accumulation_steps (int) – gradient accumulation steps num_samples (int) – number of samples
Returns:	total steps, number of epochs
Return type:	Tuple

classmethod load_model(model_path: str, model_type: str, *args, **kwargs)[source]

Create a TranformerBase deom from given path

Parameters:	model_path (str) – path to model model_type (str) – model type
Returns:	model
Return type:	TransformerBase

optimizer

save_model(output_dir: str, save_checkpoint: bool = False, args=None)[source]

Save model/tokenizer/arguments to given output directory

Parameters:	output_dir (str) – path to output directory save_checkpoint (bool, optional) – save as checkpoint. Defaults to False. args ([type], optional) – arguments object to save. Defaults to None.

save_model_checkpoint(output_path: str, name: str)[source]

save model checkpoint

Parameters:	output_path (str) – output path name (str) – name of checkpoint

scheduler

setup_default_optimizer(weight_decay: float = 0.0, learning_rate: float = 5e-05, adam_epsilon: float = 1e-08, warmup_steps: int = 0, total_steps: int = 0)[source]

to(device='cpu', n_gpus=0)[source]

update_best_model(dev_data_set, test_data_set, best_dev, best_dev_test, best_result_file, save_path=None)[source]

nlp_architect.models.transformers.base_model.get_models(models: List[str])[source]

nlp_architect.models.transformers.quantized_bert module

Quantized BERT layers and model

class nlp_architect.models.transformers.quantized_bert.QuantizedBertAttention(config)[source]

Bases: transformers.modeling_bert.BertAttention

prune_heads(heads)[source]

class nlp_architect.models.transformers.quantized_bert.QuantizedBertConfig(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, **kwargs)[source]

Bases: transformers.configuration_bert.BertConfig

pretrained_config_archive_map = {'bert-base-uncased': 'https://d2zs9tzlek599f.cloudfront.net/models/transformers/bert-base-uncased.json', 'bert-large-uncased': 'https://d2zs9tzlek599f.cloudfront.net/models/transformers/bert-large-uncased.json'}

class nlp_architect.models.transformers.quantized_bert.QuantizedBertEmbeddings(config)[source]: Bases: transformers.modeling_bert.BertEmbeddings

class nlp_architect.models.transformers.quantized_bert.QuantizedBertEncoder(config)[source]: Bases: transformers.modeling_bert.BertEncoder

class nlp_architect.models.transformers.quantized_bert.QuantizedBertForQuestionAnswering(config)[source]: Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertForQuestionAnswering

class nlp_architect.models.transformers.quantized_bert.QuantizedBertForSequenceClassification(config)[source]: Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertForSequenceClassification

class nlp_architect.models.transformers.quantized_bert.QuantizedBertForTokenClassification(config)[source]: Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertForTokenClassification

class nlp_architect.models.transformers.quantized_bert.QuantizedBertIntermediate(config)[source]: Bases: transformers.modeling_bert.BertIntermediate

class nlp_architect.models.transformers.quantized_bert.QuantizedBertLayer(config)[source]: Bases: transformers.modeling_bert.BertLayer

class nlp_architect.models.transformers.quantized_bert.QuantizedBertModel(config)[source]: Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertModel

class nlp_architect.models.transformers.quantized_bert.QuantizedBertOutput(config)[source]: Bases: transformers.modeling_bert.BertOutput

class nlp_architect.models.transformers.quantized_bert.QuantizedBertPooler(config)[source]: Bases: transformers.modeling_bert.BertPooler

class nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel(config, *inputs, **kwargs)[source]

Bases: transformers.modeling_bert.BertPreTrainedModel

base_model_prefix = 'quant_bert'

config_class: alias of QuantizedBertConfig

classmethod from_pretrained(pretrained_model_name_or_path, *args, from_8bit=False, **kwargs)[source]: load trained model from 8bit model

init_weights(module)[source]: Initialize the weights.

save_pretrained(save_directory)[source]: save trained model in 8bit

toggle_8bit(mode: bool)[source]

class nlp_architect.models.transformers.quantized_bert.QuantizedBertSelfAttention(config)[source]: Bases: transformers.modeling_bert.BertSelfAttention

class nlp_architect.models.transformers.quantized_bert.QuantizedBertSelfOutput(config)[source]: Bases: transformers.modeling_bert.BertSelfOutput

nlp_architect.models.transformers.quantized_bert.quantized_embedding_setup(config, name, *args, **kwargs)[source]: Get QuantizedEmbedding layer according to config params

nlp_architect.models.transformers.quantized_bert.quantized_linear_setup(config, name, *args, **kwargs)[source]: Get QuantizedLinear layer according to config params

nlp_architect.models.transformers.sequence_classification module

class nlp_architect.models.transformers.sequence_classification.TransformerSequenceClassifier(model_type: str, labels: List[str] = None, task_type='classification', metric_fn=<function accuracy>, load_quantized=False, *args, **kwargs)[source]

Bases: nlp_architect.models.transformers.base_model.TransformerBase

Transformer sequence classifier

Parameters:	model_type (str) – transformer base model type labels (List[str], optional) – list of labels. Defaults to None. task_type (str, optional) – task type (classification/regression). Defaults to classification. – metric_fn ([type], optional) – metric to use for evaluation. Defaults to simple_accuracy. –

MODEL_CLASS = {'bert': <class 'transformers.modeling_bert.BertForSequenceClassification'>, 'quant_bert': <class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertForSequenceClassification'>, 'roberta': <class 'transformers.modeling_roberta.RobertaForSequenceClassification'>, 'xlm': <class 'transformers.modeling_xlm.XLMForSequenceClassification'>, 'xlnet': <class 'transformers.modeling_xlnet.XLNetForSequenceClassification'>}

convert_to_tensors(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source]

Convert examples to tensor dataset

Parameters:	examples (List[SequenceClsInputExample]) – examples max_seq_length (int, optional) – max sequence length. Defaults to 128. include_labels (bool, optional) – include labels. Defaults to True.
Returns:
Return type:	TensorDataset

evaluate_predictions(logits, label_ids)[source]

Run evaluation of given logits and truth labels

Parameters:	logits – model logits label_ids – truth label ids

inference(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int, batch_size: int = 64, evaluate=False)[source]

Run inference on given examples

Parameters:	examples (List[SequenceClsInputExample]) – examples batch_size (int, optional) – batch size. Defaults to 64.
Returns:	logits

train(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100)[source]

Train a model

Parameters:

train_data_set (DataLoader) – training data set
dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development set.
to None. (Defaults) –
test_data_set (Union[DataLoader, List[DataLoader]], optional) – test set.
to None. –
gradient_accumulation_steps (int, optional) – num of gradient accumulation steps.
to 1. (Defaults) –
per_gpu_train_batch_size (int, optional) – per GPU train batch size. Defaults to 8.
max_steps (int, optional) – max steps. Defaults to -1.
num_train_epochs (int, optional) – number of train epochs. Defaults to 3.
max_grad_norm (float, optional) – max gradient normalization. Defaults to 1.0.
logging_steps (int, optional) – number of steps between logging. Defaults to 50.
save_steps (int, optional) – number of steps between model save. Defaults to 100.

nlp_architect.models.transformers.token_classification module

class nlp_architect.models.transformers.token_classification.BertTokenClassificationHead(config)[source]

Bases: transformers.modeling_bert.BertForTokenClassification

BERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

forward(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source]

The BertForTokenClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.encode_plus() for details.

What are input IDs?
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) –
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

What are attention masks?
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –
Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?
head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional, defaults to None) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.
encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.
labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) – Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

Returns:

loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) :

Classification loss.

scores (torch.FloatTensor of shape (batch_size, sequence_length, config.num_labels))

Classification scores (before SoftMax).

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

attentions (tuple(torch.FloatTensor), optional, returned when config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type:

tuple(torch.FloatTensor) comprising various elements depending on the configuration (BertConfig) and inputs

Examples:

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores = outputs[:2]

class nlp_architect.models.transformers.token_classification.QuantizedBertForTokenClassificationHead(config)[source]

Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertForTokenClassification

Quantized BERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

forward(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source]

The BertForTokenClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using transformers.BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.encode_plus() for details.

What are input IDs?
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) –
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

What are attention masks?
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –
Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

What are token type IDs?
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

What are position IDs?
head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional, defaults to None) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.
encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.
labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) – Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

Returns:

loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) :

Classification loss.

scores (torch.FloatTensor of shape (batch_size, sequence_length, config.num_labels))

Classification scores (before SoftMax).

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

attentions (tuple(torch.FloatTensor), optional, returned when config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type:

tuple(torch.FloatTensor) comprising various elements depending on the configuration (BertConfig) and inputs

Examples:

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores = outputs[:2]

class nlp_architect.models.transformers.token_classification.RobertaForTokenClassificationHead(config)[source]

Bases: transformers.modeling_bert.BertPreTrainedModel

RoBERTa token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

base_model_prefix = 'roberta'

config_class: alias of transformers.configuration_roberta.RobertaConfig

forward(input_ids, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, labels=None, valid_ids=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

pretrained_model_archive_map = {'distilroberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-pytorch_model.bin', 'roberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-pytorch_model.bin', 'roberta-base-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-openai-detector-pytorch_model.bin', 'roberta-large': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-pytorch_model.bin', 'roberta-large-mnli': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-mnli-pytorch_model.bin', 'roberta-large-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-openai-detector-pytorch_model.bin'}

class nlp_architect.models.transformers.token_classification.TransformerTokenClassifier(model_type: str, labels: List[str] = None, training_args: bool = None, *args, load_quantized=False, **kwargs)[source]

Bases: nlp_architect.models.transformers.base_model.TransformerBase

Transformer word tagging classifier :param model_type: model family (classifier head), choose between bert/quant_bert/xlnet :type model_type: str :param labels: list of tag labels :type labels: List[str], optional

MODEL_CLASS = {'bert': <class 'nlp_architect.models.transformers.token_classification.BertTokenClassificationHead'>, 'quant_bert': <class 'nlp_architect.models.transformers.token_classification.QuantizedBertForTokenClassificationHead'>, 'roberta': <class 'nlp_architect.models.transformers.token_classification.RobertaForTokenClassificationHead'>, 'xlnet': <class 'nlp_architect.models.transformers.token_classification.XLNetTokenClassificationHead'>}

convert_to_tensors(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source]

Convert examples to tensor dataset

Parameters:	examples (List[SequenceClsInputExample]) – examples max_seq_length (int, optional) – max sequence length. Defaults to 128. include_labels (bool, optional) – include labels. Defaults to True.
Returns:
Return type:	TensorDataset

evaluate_predictions(logits, label_ids)[source]

Run evaluation of given logist and truth labels

Parameters:	logits – model logits label_ids – truth label ids

static extract_labels(label_ids, label_map, logits)[source]

inference(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int, batch_size: int = 64)[source]

Run inference on given examples

Parameters:	examples (List[SequenceClsInputExample]) – examples batch_size (int, optional) – batch size. Defaults to 64.
Returns:	logits

train(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100, best_result_file: str = None)[source]

Run model training

Parameters:

train_data_set (DataLoader) – training dataset
dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development data set
be list) Defaults to None. ((can) –
test_data_set (Union[DataLoader, List[DataLoader]], optional) – test data set
be list) Defaults to None. –
gradient_accumulation_steps (int, optional) – gradient accumulation steps.
to 1. (Defaults) –
per_gpu_train_batch_size (int, optional) – per GPU train batch size (or GPU).
to 8. (Defaults) –
max_steps (int, optional) – max steps for training. Defaults to -1.
num_train_epochs (int, optional) – number of training epochs. Defaults to 3.
max_grad_norm (float, optional) – max gradient norm. Defaults to 1.0.
logging_steps (int, optional) – number of steps between logging. Defaults to 50.
save_steps (int, optional) – number of steps between model save. Defaults to 100.
best_result_file (str, optional) – path to save best dev results when it’s updated.

class nlp_architect.models.transformers.token_classification.XLNetTokenClassificationHead(config)[source]

Bases: transformers.modeling_xlnet.XLNetPreTrainedModel

XLNet token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

forward(input_ids, token_type_ids=None, input_mask=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, labels=None, head_mask=None, valid_ids=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

nlp_architect.models.transformers package

Submodules

nlp_architect.models.transformers.base_model module

nlp_architect.models.transformers.quantized_bert module

nlp_architect.models.transformers.sequence_classification module

nlp_architect.models.transformers.token_classification module

Module contents