Spacy-NP Annotator
Chunker based noun phrase annotator
The noun phrase annotator is a plug-in that can be used with Spacy pipeline structure.
The annotator loads a trained SequenceChunker
model that is able to predict chunk labels, creates Spacy based Span objects and applies a sequence of filtering to produce a set of noun phrases, finally, it attaches it to the document object.
The annotator implementation can be found in NPAnnotator
.
Usage example
Loading a Spacy pipeline and adding a sentence breaker (required) and NPAnnotator
annotator as the last annotator in the pipeline:
nlp = spacy.load('en')
nlp.add_pipe(nlp.create_pipe('sentencizer'), first=True)
nlp.add_pipe(NPAnnotator.load(<path_to_model>, <path_to_params>), last=True)
Parse documents regularly and get the noun phrase annotations using a dedicated method:
doc = nlp('The quick brown fox jumped over the fence')
noun_phrases = nlp_architect.pipelines.spacy_np_annotator.get_noun_phrases(doc)
Standalone Spacy-NPAnnotator
For use cases in which the user is not interested in specialized Spacy pipelines we have implemented SpacyNPAnnotator
which will run a Spacy pipeline internally and provide string based noun phrase chunks given documents in string format.
Usage example
Just as in NPAnnotator
, we need to provide a trained SequenceChunker
model and its parameters file. It is also possible to provide a specific Spacy model to base the pipeline on.
The following example shows how to load a model/parameters using the default Spacy English model (en) and how to get the noun phrase annotations.
spacy_np = SpacyNPAnnotator(<model_path>, <model_parameters_path>, spacy_mode='en')
noun_phrases = spacy_np('The quick brown fox jumped over the fence')