nlp_architect.common.cdc package

Submodules

nlp_architect.common.cdc.cluster module

class nlp_architect.common.cdc.cluster.Cluster(coref_chain: int = -1)[source]

Bases: object

add_mention(mention: nlp_architect.common.cdc.mention_data.MentionData) → None[source]
get_cluster_id() → str[source]
Returns:A generated cluster unique Id created from cluster mentions ids
get_mentions()[source]
merge_clusters(cluster) → None[source]
Parameters:cluster – cluster to merge this cluster with
class nlp_architect.common.cdc.cluster.Clusters(topic_id: str, mentions: List[nlp_architect.common.cdc.mention_data.MentionData] = None)[source]

Bases: object

add_cluster(cluster: nlp_architect.common.cdc.cluster.Cluster) → None[source]
add_clusters(clusters) → None[source]
clean_clusters() → None[source]

Remove all clusters that were already merged with other clusters

cluster_coref_chain = 1000
set_coref_chain_to_mentions() → None[source]

Give all cluster mentions the same coref ID as cluster coref chain ID

set_initial_clusters(mentions: List[nlp_architect.common.cdc.mention_data.MentionData]) → None[source]
Parameters:mentionslist[MentionData], required The initial mentions to create the clusters from

nlp_architect.common.cdc.mention_data module

class nlp_architect.common.cdc.mention_data.MentionData(topic_id: str, doc_id: str, sent_id: int, tokens_numbers: List[int], tokens_str: str, mention_context: List[str], mention_head: str, mention_head_lemma: str, coref_chain: str, mention_type: str = 'NA', is_continuous: bool = True, is_singleton: bool = False, score: float = -1.0, predicted_coref_chain: str = None, mention_pos: str = None, mention_ner: str = None, mention_index: int = -1)[source]

Bases: nlp_architect.common.cdc.mention_data.MentionDataLight

gen_mention_id() → str[source]
get_mention_id() → str[source]
get_tokens()[source]
static read_json_mention_data_line(mention_line: str)[source]
Parameters:mention_line – a Json representation of a single mention
Returns:MentionData object
static read_mentions_json_to_mentions_data_list(mentions_json_file: str)[source]
Parameters:mentions_json_file – the path of the mentions json file to read
Returns:List[MentionData]
static static_gen_token_unique_id(doc_id: int, sent_id: int, token_id: int) → str[source]
class nlp_architect.common.cdc.mention_data.MentionDataLight(tokens_str: str, mention_context: str = None, mention_head: str = None, mention_head_lemma: str = None, mention_pos: str = None, mention_ner: str = None)[source]

Bases: object

nlp_architect.common.cdc.topics module

class nlp_architect.common.cdc.topics.Topic(topic_id)[source]

Bases: object

class nlp_architect.common.cdc.topics.Topics[source]

Bases: object

create_from_file(mentions_file_path: str, keep_order: bool = False) → None[source]
Parameters:
  • keep_order – whether to keep original mentions order or not (default = False)
  • mentions_file_path – this topic mentions json file
load_mentions_from_file(mentions_file_path: str) → List[nlp_architect.common.cdc.topics.Topic][source]
order_mentions_by_topics(mentions: str) → List[nlp_architect.common.cdc.topics.Topic][source]

Order mentions to documents topics :param mentions: json mentions file

Returns:List[Topic] of the mentions separated by their documents topics

Module contents