Spacy-NP Annotator

Chunker based noun phrase annotator

The noun phrase annotator is a plug-in that can be used with Spacy pipeline structure.

The annotator loads a trained SequenceChunker model that is able to predict chunk labels, creates Spacy based Span objects and applies a sequence of filtering to produce a set of noun phrases, finally, it attaches it to the document object.

The annotator implementation can be found in NPAnnotator.

Usage example

Loading a Spacy pipeline and adding a sentence breaker (required) and NPAnnotator annotator as the last annotator in the pipeline:

nlp = spacy.load('en')
nlp.add_pipe(nlp.create_pipe('sentencizer'), first=True)
nlp.add_pipe(NPAnnotator.load(<path_to_model>, <path_to_params>), last=True)

Parse documents regularly and get the noun phrase annotations using a dedicated method:

doc = nlp('The quick brown fox jumped over the fence')
noun_phrases = nlp_architect.pipelines.spacy_np_annotator.get_noun_phrases(doc)

Standalone Spacy-NPAnnotator

For use cases in which the user is not interested in specialized Spacy pipelines we have implemented SpacyNPAnnotator which will run a Spacy pipeline internally and provide string based noun phrase chunks given documents in string format.

Usage example

Just as in NPAnnotator, we need to provide a trained SequenceChunker model and its parameters file. It is also possible to provide a specific Spacy model to base the pipeline on.

The following example shows how to load a model/parameters using the default Spacy English model (en) and how to get the noun phrase annotations.

spacy_np = SpacyNPAnnotator(<model_path>, <model_parameters_path>, spacy_mode='en')
noun_phrases = spacy_np('The quick brown fox jumped over the fence')