I have the below dataframe. I want to build a rule engine to extract the tokens where the pattern is like Eg. "UNITED STATES" .What is the best way to do it ? Is there anything like regex or CGUL for this kind of tasks? Any suggestions would be appreciated.
WORD_INDEX WORD_TOKEN WORD_POS
0 TRUMP PROPN
1 IS ADP
2 THE ADP
3 PRESIDENT NOUN
4 OF ADP
5 THE ADP
6 UNITED NOUN
7 STATES NOUN
I want to start with WORD_POS and find the WORD_TOKEN. Any idea how to do that? For example, I want to find the WORD_TOKENs where the WORD_POS is NOUN and then next WORD_POS is also NOUN.