nlp_architect.nn.torch.modules package

Submodules

nlp_architect.nn.torch.modules.embedders module

class nlp_architect.nn.torch.modules.embedders.CNNLSTM(word_vocab_size: int, num_labels: int, word_embedding_dims: int = 100, char_embedding_dims: int = 16, cnn_kernel_size: int = 3, cnn_num_filters: int = 128, lstm_hidden_size: int = 100, lstm_layers: int = 2, bidir: bool = True, dropout: float = 0.5, padding_idx: int = 0)[source]

Bases: torch.nn.modules.module.Module

CNN-LSTM embedder (based on Ma and Hovy. 2016)

Parameters:

word_vocab_size (int) – word vocabulary size
num_labels (int) – number of labels (classifier)
word_embedding_dims (int, optional) – word embedding dims
char_embedding_dims (int, optional) – character embedding dims
cnn_kernel_size (int, optional) – character CNN kernel size
cnn_num_filters (int, optional) – character CNN number of filters
lstm_hidden_size (int, optional) – LSTM embedder hidden size
lstm_layers (int, optional) – num of LSTM layers
bidir (bool, optional) – apply bi-directional LSTM
dropout (float, optional) – dropout rate
padding_idx (int, optinal) – padding number for embedding layers

forward(words, word_chars, **kwargs)[source]

CNN-LSTM forward step

Parameters:	words (torch.tensor) – words word_chars (torch.tensor) – word character tensors
Returns:	logits of model
Return type:	torch.tensor

classmethod from_config(word_vocab_size: int, num_labels: int, config: str)[source]

Load a model from a configuration file A valid configuration file is a JSON file with fields as in class __init__

Parameters:	word_vocab_size (int) – word vocabulary size num_labels (int) – number of labels (classifier) config (str) – path to configuration file
Returns:	CNNLSTM module pre-configured
Return type:	CNNLSTM

load_embeddings(embeddings)[source]

Load pre-defined word embeddings

Parameters:	embeddings (torch.tensor) – word embedding tensor

class nlp_architect.nn.torch.modules.embedders.IDCNN(word_vocab_size: int, num_labels: int, word_embedding_dims: int = 100, shape_vocab_size: int = 4, shape_embedding_dims: int = 5, char_embedding_dims: int = 16, char_cnn_filters: int = 128, char_cnn_kernel_size: int = 3, cnn_kernel_size: int = 3, cnn_num_filters: int = 128, input_dropout: float = 0.35, middle_dropout: float = 0, hidden_dropout: float = 0.15, blocks: int = 1, dilations: List = None, embedding_pad_idx: int = 0, use_chars: bool = False, drop_penalty: float = 0.0001)[source]

Bases: torch.nn.modules.module.Module

ID-CNN (iterated dilated) tagging model (based on Strubell et al 2017) with word character embedding (using CNN feature extractors)