Matching words with NLTKs chunk parser

2024/10/16 3:23:57

NLTK's chunk parser's regular expressions can match POS tags, but can they also match specific words?
So, suppose I want to chunk any structure with a noun followed by the verb "left" (call this pattern L). For example, the sentence "the\DT dog\NN left\VB" should be chunked as
(S (DT the) (L (NN dog) (VB left))), but the sentence "the\DT dog\NN slept\VB" wouldn't be chunked at all.

I haven't been able to find any documentation on the chunking regex syntax, and all examples I've seen only match POS tags.


I had a similar problem and after realizing that the regex pattern will only examine tags, I changed the tag on the the piece I was interested in.

For example, I was trying to match product name and version and using a chunk rule like \NNP+\CD worked for "Internet Explorer 8.0" but failed on "Internet Explorer 8.0 SP2" where it tagged SP2 as a NNP.

Perhaps I could have trained a POS tagger but decided instead to just change the tag to SP and then a chunk rule like \NNP+\CD\SP* will match either example.

