Using Spacy, I extract aspect-opinion pairs from a text, based on the grammar rules that I defined. Rules are based on POS tags and dependency tags, which is obtained by token.pos_
and token.dep_
. Below is an example of one of the grammar rules. If I pass the sentence Japan is cool,
it returns [('Japan', 'cool', 0.3182)]
, where the value represents the polarity of cool
.
However I don't know how I can make it recognise the Named Entities. For example, if I pass Air France is cool
, I want to get [('Air France', 'cool', 0.3182)]
but what I currently get is [('France', 'cool', 0.3182)]
.
I checked Spacy online documentation and I know how to extract NE(doc.ents
). But I want to know what the possible workaround is to make my extractor work. Please note that I don't want a forced measure such as concatenating strings AirFrance
, Air_France
etc.
Thank you!
import spacynlp = spacy.load("en_core_web_lg-2.2.5")
review_body = "Air France is cool."
doc=nlp(review_body)rule3_pairs = []for token in doc:children = token.childrenA = "999999"M = "999999"add_neg_pfx = Falsefor child in children :if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subjectA = child.textif(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complementM = child.text# example - 'this could have been better' -> (this, not better)if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliaryneg_prefix = "not"add_neg_pfx = Trueif(child.dep_ == "neg"): # neg is negationneg_prefix = child.textadd_neg_pfx = Trueif (add_neg_pfx and M != "999999"):M = neg_prefix + " " + Mif(A != "999999" and M != "999999"):rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
Result
rule3_pairs
>>> [('France', 'cool', 0.3182)]
Desired output
rule3_pairs
>>> [('Air France', 'cool', 0.3182)]