nlp_architect.models.transformers package

Submodules

nlp_architect.models.transformers.base_model module

class nlp_architect.models.transformers.base_model.InputFeatures(input_ids, input_mask, segment_ids, label_id=None, valid_ids=None)[source]

Bases: object

A single set of features of data.

class nlp_architect.models.transformers.base_model.TransformerBase(model_type: str, model_name_or_path: str, labels: List[str] = None, num_labels: int = None, config_name=None, tokenizer_name=None, do_lower_case=False, output_path=None, device='cpu', n_gpus=0)[source]

Bases: nlp_architect.models.TrainableModel

Transformers base model (for working with pytorch-transformers models)

MODEL_CONFIGURATIONS = {'bert': (<class 'transformers.configuration_bert.BertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'quant_bert': (<class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertConfig'>, <class 'transformers.tokenization_bert.BertTokenizer'>), 'roberta': (<class 'transformers.configuration_roberta.RobertaConfig'>, <class 'transformers.tokenization_roberta.RobertaTokenizer'>), 'xlm': (<class 'transformers.configuration_xlm.XLMConfig'>, <class 'transformers.tokenization_xlm.XLMTokenizer'>), 'xlnet': (<class 'transformers.configuration_xlnet.XLNetConfig'>, <class 'transformers.tokenization_xlnet.XLNetTokenizer'>)}
evaluate_predictions(logits, label_ids)[source]
get_logits(batch)[source]

get model logits from given input

static get_train_steps_epochs(max_steps: int, num_train_epochs: int, gradient_accumulation_steps: int, num_samples: int)[source]

get train steps and epochs

Parameters:
  • max_steps (int) – max steps
  • num_train_epochs (int) – num epochs
  • gradient_accumulation_steps (int) – gradient accumulation steps
  • num_samples (int) – number of samples
Returns:

total steps, number of epochs

Return type:

Tuple

classmethod load_model(model_path: str, model_type: str, *args, **kwargs)[source]

Create a TranformerBase deom from given path

Parameters:
  • model_path (str) – path to model
  • model_type (str) – model type
Returns:

model

Return type:

TransformerBase

optimizer
save_model(output_dir: str, save_checkpoint: bool = False, args=None)[source]

Save model/tokenizer/arguments to given output directory

Parameters:
  • output_dir (str) – path to output directory
  • save_checkpoint (bool, optional) – save as checkpoint. Defaults to False.
  • args ([type], optional) – arguments object to save. Defaults to None.
save_model_checkpoint(output_path: str, name: str)[source]

save model checkpoint

Parameters:
  • output_path (str) – output path
  • name (str) – name of checkpoint
scheduler
setup_default_optimizer(weight_decay: float = 0.0, learning_rate: float = 5e-05, adam_epsilon: float = 1e-08, warmup_steps: int = 0, total_steps: int = 0)[source]
to(device='cpu', n_gpus=0)[source]
update_best_model(dev_data_set, test_data_set, best_dev, best_dev_test, best_result_file, save_path=None)[source]
nlp_architect.models.transformers.base_model.get_models(models: List[str])[source]

nlp_architect.models.transformers.quantized_bert module

Quantized BERT layers and model

class nlp_architect.models.transformers.quantized_bert.QuantizedBertAttention(config)[source]

Bases: transformers.modeling_bert.BertAttention

prune_heads(heads)[source]
class nlp_architect.models.transformers.quantized_bert.QuantizedBertConfig(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, **kwargs)[source]

Bases: transformers.configuration_bert.BertConfig

pretrained_config_archive_map = {'bert-base-uncased': 'https://d2zs9tzlek599f.cloudfront.net//models/transformers/bert-base-uncased.json', 'bert-large-uncased': 'https://d2zs9tzlek599f.cloudfront.net//models/transformers/bert-large-uncased.json'}
class nlp_architect.models.transformers.quantized_bert.QuantizedBertEmbeddings(config)[source]

Bases: transformers.modeling_bert.BertEmbeddings

class nlp_architect.models.transformers.quantized_bert.QuantizedBertEncoder(config)[source]

Bases: transformers.modeling_bert.BertEncoder

class nlp_architect.models.transformers.quantized_bert.QuantizedBertForQuestionAnswering(config)[source]

Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertForQuestionAnswering

class nlp_architect.models.transformers.quantized_bert.QuantizedBertForSequenceClassification(config)[source]

Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertForSequenceClassification

class nlp_architect.models.transformers.quantized_bert.QuantizedBertForTokenClassification(config)[source]

Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertForTokenClassification

class nlp_architect.models.transformers.quantized_bert.QuantizedBertIntermediate(config)[source]

Bases: transformers.modeling_bert.BertIntermediate

class nlp_architect.models.transformers.quantized_bert.QuantizedBertLayer(config)[source]

Bases: transformers.modeling_bert.BertLayer

class nlp_architect.models.transformers.quantized_bert.QuantizedBertModel(config)[source]

Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel, transformers.modeling_bert.BertModel

class nlp_architect.models.transformers.quantized_bert.QuantizedBertOutput(config)[source]

Bases: transformers.modeling_bert.BertOutput

class nlp_architect.models.transformers.quantized_bert.QuantizedBertPooler(config)[source]

Bases: transformers.modeling_bert.BertPooler

class nlp_architect.models.transformers.quantized_bert.QuantizedBertPreTrainedModel(config, *inputs, **kwargs)[source]

Bases: transformers.modeling_bert.BertPreTrainedModel

base_model_prefix = 'quant_bert'
config_class

alias of QuantizedBertConfig

classmethod from_pretrained(pretrained_model_name_or_path, *args, from_8bit=False, **kwargs)[source]

load trained model from 8bit model

init_weights(module)[source]

Initialize the weights.

save_pretrained(save_directory)[source]

save trained model in 8bit

toggle_8bit(mode: bool)[source]
class nlp_architect.models.transformers.quantized_bert.QuantizedBertSelfAttention(config)[source]

Bases: transformers.modeling_bert.BertSelfAttention

class nlp_architect.models.transformers.quantized_bert.QuantizedBertSelfOutput(config)[source]

Bases: transformers.modeling_bert.BertSelfOutput

nlp_architect.models.transformers.quantized_bert.quantized_embedding_setup(config, name, *args, **kwargs)[source]

Get QuantizedEmbedding layer according to config params

nlp_architect.models.transformers.quantized_bert.quantized_linear_setup(config, name, *args, **kwargs)[source]

Get QuantizedLinear layer according to config params

nlp_architect.models.transformers.sequence_classification module

class nlp_architect.models.transformers.sequence_classification.TransformerSequenceClassifier(model_type: str, labels: List[str] = None, task_type='classification', metric_fn=<function accuracy>, load_quantized=False, *args, **kwargs)[source]

Bases: nlp_architect.models.transformers.base_model.TransformerBase

Transformer sequence classifier

Parameters:
  • model_type (str) – transformer base model type
  • labels (List[str], optional) – list of labels. Defaults to None.
  • task_type (str, optional) – task type (classification/regression). Defaults to
  • classification.
  • metric_fn ([type], optional) – metric to use for evaluation. Defaults to
  • simple_accuracy.
MODEL_CLASS = {'bert': <class 'transformers.modeling_bert.BertForSequenceClassification'>, 'quant_bert': <class 'nlp_architect.models.transformers.quantized_bert.QuantizedBertForSequenceClassification'>, 'roberta': <class 'transformers.modeling_roberta.RobertaForSequenceClassification'>, 'xlm': <class 'transformers.modeling_xlm.XLMForSequenceClassification'>, 'xlnet': <class 'transformers.modeling_xlnet.XLNetForSequenceClassification'>}
convert_to_tensors(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source]

Convert examples to tensor dataset

Parameters:
  • examples (List[SequenceClsInputExample]) – examples
  • max_seq_length (int, optional) – max sequence length. Defaults to 128.
  • include_labels (bool, optional) – include labels. Defaults to True.
Returns:

Return type:

TensorDataset

evaluate_predictions(logits, label_ids)[source]

Run evaluation of given logits and truth labels

Parameters:
  • logits – model logits
  • label_ids – truth label ids
inference(examples: List[nlp_architect.data.sequence_classification.SequenceClsInputExample], max_seq_length: int, batch_size: int = 64, evaluate=False)[source]

Run inference on given examples

Parameters:
Returns:

logits

train(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100)[source]

Train a model

Parameters:
  • train_data_set (DataLoader) – training data set
  • dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development set.
  • to None. (Defaults) –
  • test_data_set (Union[DataLoader, List[DataLoader]], optional) – test set.
  • to None.
  • gradient_accumulation_steps (int, optional) – num of gradient accumulation steps.
  • to 1. (Defaults) –
  • per_gpu_train_batch_size (int, optional) – per GPU train batch size. Defaults to 8.
  • max_steps (int, optional) – max steps. Defaults to -1.
  • num_train_epochs (int, optional) – number of train epochs. Defaults to 3.
  • max_grad_norm (float, optional) – max gradient normalization. Defaults to 1.0.
  • logging_steps (int, optional) – number of steps between logging. Defaults to 50.
  • save_steps (int, optional) – number of steps between model save. Defaults to 100.

nlp_architect.models.transformers.token_classification module

class nlp_architect.models.transformers.token_classification.BertTokenClassificationHead(config)[source]

Bases: transformers.modeling_bert.BertForTokenClassification

BERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

forward(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source]

The BertForTokenClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters:
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary.

    Indices can be obtained using transformers.BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.encode_plus() for details.

    What are input IDs?

  • attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

    What are attention masks?

  • token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –

    Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

    What are token type IDs?

  • position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –

    Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

    What are position IDs?

  • head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional, defaults to None) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.
  • inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
  • encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.
  • encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.
  • labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) – Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].
Returns:

loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) :

Classification loss.

scores (torch.FloatTensor of shape (batch_size, sequence_length, config.num_labels))

Classification scores (before SoftMax).

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

attentions (tuple(torch.FloatTensor), optional, returned when config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type:

tuple(torch.FloatTensor) comprising various elements depending on the configuration (BertConfig) and inputs

Examples:

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores = outputs[:2]
class nlp_architect.models.transformers.token_classification.QuantizedBertForTokenClassificationHead(config)[source]

Bases: nlp_architect.models.transformers.quantized_bert.QuantizedBertForTokenClassification

Quantized BERT token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

forward(input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None, valid_ids=None)[source]

The BertForTokenClassification forward method, overrides the __call__() special method.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Parameters:
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary.

    Indices can be obtained using transformers.BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.encode_plus() for details.

    What are input IDs?

  • attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.

    What are attention masks?

  • token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –

    Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]: 0 corresponds to a sentence A token, 1 corresponds to a sentence B token

    What are token type IDs?

  • position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) –

    Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

    What are position IDs?

  • head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional, defaults to None) – Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]: 1 indicates the head is not masked, 0 indicates the head is masked.
  • inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
  • encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional, defaults to None) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.
  • encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional, defaults to None) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]: 1 for tokens that are NOT MASKED, 0 for MASKED tokens.
  • labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) – Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].
Returns:

loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) :

Classification loss.

scores (torch.FloatTensor of shape (batch_size, sequence_length, config.num_labels))

Classification scores (before SoftMax).

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True):

Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

attentions (tuple(torch.FloatTensor), optional, returned when config.output_attentions=True):

Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type:

tuple(torch.FloatTensor) comprising various elements depending on the configuration (BertConfig) and inputs

Examples:

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores = outputs[:2]
class nlp_architect.models.transformers.token_classification.RobertaForTokenClassificationHead(config)[source]

Bases: transformers.modeling_bert.BertPreTrainedModel

RoBERTa token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

base_model_prefix = 'roberta'
config_class

alias of transformers.configuration_roberta.RobertaConfig

forward(input_ids, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, labels=None, valid_ids=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

pretrained_model_archive_map = {'distilroberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-pytorch_model.bin', 'roberta-base': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-pytorch_model.bin', 'roberta-base-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-openai-detector-pytorch_model.bin', 'roberta-large': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-pytorch_model.bin', 'roberta-large-mnli': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-mnli-pytorch_model.bin', 'roberta-large-openai-detector': 'https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-openai-detector-pytorch_model.bin'}
class nlp_architect.models.transformers.token_classification.TransformerTokenClassifier(model_type: str, labels: List[str] = None, training_args: bool = None, *args, load_quantized=False, **kwargs)[source]

Bases: nlp_architect.models.transformers.base_model.TransformerBase

Transformer word tagging classifier :param model_type: model family (classifier head), choose between bert/quant_bert/xlnet :type model_type: str :param labels: list of tag labels :type labels: List[str], optional

MODEL_CLASS = {'bert': <class 'nlp_architect.models.transformers.token_classification.BertTokenClassificationHead'>, 'quant_bert': <class 'nlp_architect.models.transformers.token_classification.QuantizedBertForTokenClassificationHead'>, 'roberta': <class 'nlp_architect.models.transformers.token_classification.RobertaForTokenClassificationHead'>, 'xlnet': <class 'nlp_architect.models.transformers.token_classification.XLNetTokenClassificationHead'>}
convert_to_tensors(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int = 128, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source]

Convert examples to tensor dataset

Parameters:
  • examples (List[SequenceClsInputExample]) – examples
  • max_seq_length (int, optional) – max sequence length. Defaults to 128.
  • include_labels (bool, optional) – include labels. Defaults to True.
Returns:

Return type:

TensorDataset

evaluate_predictions(logits, label_ids)[source]

Run evaluation of given logist and truth labels

Parameters:
  • logits – model logits
  • label_ids – truth label ids
static extract_labels(label_ids, label_map, logits)[source]
inference(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int, batch_size: int = 64)[source]

Run inference on given examples

Parameters:
Returns:

logits

train(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, test_data_set: Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]] = None, gradient_accumulation_steps: int = 1, per_gpu_train_batch_size: int = 8, max_steps: int = -1, num_train_epochs: int = 3, max_grad_norm: float = 1.0, logging_steps: int = 50, save_steps: int = 100, best_result_file: str = None)[source]

Run model training

Parameters:
  • train_data_set (DataLoader) – training dataset
  • dev_data_set (Union[DataLoader, List[DataLoader]], optional) – development data set
  • be list) Defaults to None. ((can) –
  • test_data_set (Union[DataLoader, List[DataLoader]], optional) – test data set
  • be list) Defaults to None.
  • gradient_accumulation_steps (int, optional) – gradient accumulation steps.
  • to 1. (Defaults) –
  • per_gpu_train_batch_size (int, optional) – per GPU train batch size (or GPU).
  • to 8. (Defaults) –
  • max_steps (int, optional) – max steps for training. Defaults to -1.
  • num_train_epochs (int, optional) – number of training epochs. Defaults to 3.
  • max_grad_norm (float, optional) – max gradient norm. Defaults to 1.0.
  • logging_steps (int, optional) – number of steps between logging. Defaults to 50.
  • save_steps (int, optional) – number of steps between model save. Defaults to 100.
  • best_result_file (str, optional) – path to save best dev results when it’s updated.
class nlp_architect.models.transformers.token_classification.XLNetTokenClassificationHead(config)[source]

Bases: transformers.modeling_xlnet.XLNetPreTrainedModel

XLNet token classification head with linear classifier. This head’s forward ignores word piece tokens in its linear layer.

The forward requires an additional ‘valid_ids’ map that maps the tensors for valid tokens (e.g., ignores additional word piece tokens generated by the tokenizer, as in NER task the ‘X’ label).

forward(input_ids, token_type_ids=None, input_mask=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, labels=None, head_mask=None, valid_ids=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Module contents