Extract parent and child node from python tree

2024/10/5 14:58:26

I am using nltk's Tree data structure.Below is the sample nltk.Tree.

(S(S(ADVP (RB recently))(NP (NN someone))(VP(VBD mentioned)(NP (DT the) (NN word) (NN malaria))(PP (TO to) (NP (PRP me)))))(, ,)(CC and)(IN so)(S(NP(NP (CD one) (JJ whole) (NN flood))(PP (IN of) (NP (NNS memories))))(VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back))))))(. .))

I am not aware of nltk.Tree datastructure. I want to extract the parent and the super parent node for every leaf node e.g. for 'recently' I want (ADVP, RB), and for 'someone' it is (NP, NN)This is the final outcome i want.Earlier answer used eval() function to do so which i want to avoid.

[('ADVP', 'RB'), ('NP', 'NN'), ('VP', 'VBD'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'), ('PP', 'TO'), ('NP', 'PRP'), ('S', 'CC'), ('S', 'IN'), ('NP', 'CD'), ('NP', 'JJ'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NNS'), ('VP', 'VBD'), ('VP', 'VBG'), ('ADVP', 'RB')]

Python code for the same without using eval function and using nltk tree datastructure

sentences = " (S(S
(ADVP (RB recently))
(NP (NN someone))
(VP(VBD mentioned)(NP (DT the) (NN word) (NN malaria))(PP (TO to) (NP (PRP me)))))(, ,)(CC and)(IN so)(S(NP(NP (CD one) (JJ whole) (NN flood))(PP (IN of) (NP (NNS memories))))(VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back))))))(. .))"print list(tails(sentences))def tails(items, path=()):
for child in items:if type(child) is nltk.Tree:if child.label() in {".", ","}:  # ignore punctuationcontinuefor result in tails(child, path + (child.label(),)):yield resultelse:yield path[-2:]

