nlp_architect.nn.torch.modules package

Submodules

nlp_architect.nn.torch.modules.embedders module

class nlp_architect.nn.torch.modules.embedders.CNNLSTM(word_vocab_size: int, num_labels: int, word_embedding_dims: int = 100, char_embedding_dims: int = 16, cnn_kernel_size: int = 3, cnn_num_filters: int = 128, lstm_hidden_size: int = 100, lstm_layers: int = 2, bidir: bool = True, dropout: float = 0.5, padding_idx: int = 0)[source]

Bases: torch.nn.modules.module.Module

CNN-LSTM embedder (based on Ma and Hovy. 2016)

Parameters:
  • word_vocab_size (int) – word vocabulary size
  • num_labels (int) – number of labels (classifier)
  • word_embedding_dims (int, optional) – word embedding dims
  • char_embedding_dims (int, optional) – character embedding dims
  • cnn_kernel_size (int, optional) – character CNN kernel size
  • cnn_num_filters (int, optional) – character CNN number of filters
  • lstm_hidden_size (int, optional) – LSTM embedder hidden size
  • lstm_layers (int, optional) – num of LSTM layers
  • bidir (bool, optional) – apply bi-directional LSTM
  • dropout (float, optional) – dropout rate
  • padding_idx (int, optinal) – padding number for embedding layers
forward(words, word_chars, **kwargs)[source]

CNN-LSTM forward step

Parameters:
  • words (torch.tensor) – words
  • word_chars (torch.tensor) – word character tensors
Returns:

logits of model

Return type:

torch.tensor

classmethod from_config(word_vocab_size: int, num_labels: int, config: str)[source]

Load a model from a configuration file A valid configuration file is a JSON file with fields as in class __init__

Parameters:
  • word_vocab_size (int) – word vocabulary size
  • num_labels (int) – number of labels (classifier)
  • config (str) – path to configuration file
Returns:

CNNLSTM module pre-configured

Return type:

CNNLSTM

load_embeddings(embeddings)[source]

Load pre-defined word embeddings

Parameters:embeddings (torch.tensor) – word embedding tensor
class nlp_architect.nn.torch.modules.embedders.IDCNN(word_vocab_size: int, num_labels: int, word_embedding_dims: int = 100, shape_vocab_size: int = 4, shape_embedding_dims: int = 5, char_embedding_dims: int = 16, char_cnn_filters: int = 128, char_cnn_kernel_size: int = 3, cnn_kernel_size: int = 3, cnn_num_filters: int = 128, input_dropout: float = 0.35, middle_dropout: float = 0, hidden_dropout: float = 0.15, blocks: int = 1, dilations: List = None, embedding_pad_idx: int = 0, use_chars: bool = False, drop_penalty: float = 0.0001)[source]

Bases: torch.nn.modules.module.Module

ID-CNN (iterated dilated) tagging model (based on Strubell et al 2017) with word character embedding (using CNN feature extractors)

Parameters:
  • word_vocab_size (int) – word vocabulary size
  • num_labels (int) – number of labels (classifier)
  • word_embedding_dims (int, optional) – word embedding dims
  • shape_vocab_size (int, optional) – shape vocabulary size
  • shape_embedding_dims (int, optional) – shape embedding dims
  • char_embedding_dims (int, optional) – character embedding dims
  • char_cnn_filters (int, optional) – character CNN kernel size
  • char_cnn_kernel_size (int, optional) – character CNN number of filters
  • cnn_kernel_size (int, optional) – CNN embedder kernel size
  • cnn_num_filters (int, optional) – CNN embedder number of filters
  • input_dropout (float, optional) – input layer (embedding) dropout rate
  • middle_dropout (float, optional) – middle layer dropout rate
  • hidden_dropout (float, optional) – hidden layer dropout rate
  • blocks (int, optinal) – number of blocks
  • dilations (List, optinal) – List of dilations per CNN layer
  • embedding_pad_idx (int, optional) – padding number for embedding layers
  • use_chars (bool, optional) – whether to use char embedding, defaults to False
  • drop_penalty (float, optional) – penalty for dropout regularization
forward(words, word_chars, shapes, no_dropout=False, **kwargs)[source]

IDCNN forward step

Parameters:
  • words (torch.tensor) – words
  • word_chars (torch.tensor) – word character tensors
  • shapes (torch.tensor) – words shapes
Returns:

logits of model

Return type:

torch.tensor

classmethod from_config(word_vocab_size: int, num_labels: int, config: str)[source]

Load a model from a configuration file A valid configuration file is a JSON file with fields as in class __init__

Parameters:
  • word_vocab_size (int) – word vocabulary size
  • num_labels (int) – number of labels (classifier)
  • config (str) – path to configuration file
Returns:

IDCNNEmbedder module pre-configured

Return type:

IDCNN

load_embeddings(embeddings)[source]

Load pre-defined word embeddings

Parameters:embeddings (torch.tensor) – word embedding tensor

Module contents