Chunker based noun phrase annotator¶
The noun phrase annotator is a plug-in that can be used with Spacy pipeline structure.
The annotator loads a trained
SequenceChunker model that is able to predict chunk labels, creates Spacy based Span objects and applies a sequence of filtering to produce a set of noun phrases, finally, it attaches it to the document object.
The annotator implementation can be found in
Loading a Spacy pipeline and adding a sentence breaker (required) and
NPAnnotator annotator as the last annotator in the pipeline:
nlp = spacy.load('en') nlp.add_pipe(nlp.create_pipe('sentencizer'), first=True) nlp.add_pipe(NPAnnotator.load(<path_to_model>, <path_to_params>), last=True)
Parse documents regularly and get the noun phrase annotations using a dedicated method:
doc = nlp('The quick brown fox jumped over the fence') noun_phrases = nlp_architect.pipelines.spacy_np_annotator.get_noun_phrases(doc)
For use cases in which the user is not interested in specialized Spacy pipelines we have implemented
SpacyNPAnnotator which will run a Spacy pipeline internally and provide string based noun phrase chunks given documents in string format.
The following example shows how to load a model/parameters using the default Spacy English model (en) and how to get the noun phrase annotations.
spacy_np = SpacyNPAnnotator(<model_path>, <model_parameters_path>, spacy_mode='en') noun_phrases = spacy_np('The quick brown fox jumped over the fence')