Matching words with NLTKs chunk parser

2024/10/16 3:23:57

NLTK's chunk parser's regular expressions can match POS tags, but can they also match specific words?
So, suppose I want to chunk any structure with a noun followed by the verb "left" (call this pattern L). For example, the sentence "the\DT dog\NN left\VB" should be chunked as
(S (DT the) (L (NN dog) (VB left))), but the sentence "the\DT dog\NN slept\VB" wouldn't be chunked at all.

I haven't been able to find any documentation on the chunking regex syntax, and all examples I've seen only match POS tags.

Answer

I had a similar problem and after realizing that the regex pattern will only examine tags, I changed the tag on the the piece I was interested in.

For example, I was trying to match product name and version and using a chunk rule like \NNP+\CD worked for "Internet Explorer 8.0" but failed on "Internet Explorer 8.0 SP2" where it tagged SP2 as a NNP.

Perhaps I could have trained a POS tagger but decided instead to just change the tag to SP and then a chunk rule like \NNP+\CD\SP* will match either example.

https://en.xdnf.cn/q/69207.html

Related Q&A

How to create a dual-authentication HTTPS client in Python without (L)GPL libs?

Both the client and the server are internal, each has a certificate signed by the internal CA and the CA certificate. I need the client to authenticate the servers certificate against the CA certificat…

Generate a certificate for .exe created by pyinstaller

I wrote a script for my company that randomly selects employees for random drug tests. It works wonderfully, except when I gave it to the person who would use the program. She clicked on it and a messa…

Some doubts modelling some features for the libsvm/scikit-learn library in python

I have scraped a lot of ebay titles like this one:Apple iPhone 5 White 16GB Dual-Coreand I have manually tagged all of them in this wayB M C S NAwhere B=Brand (Apple) M=Model (iPhone 5) C=Color (White)…

Python ReportLab use of splitfirst/splitlast

Im trying to use Python with ReportLab 2.2 to create a PDF report. According to the user guide,Special TableStyle Indeces [sic]In any style command the first row index may be set to one of the special …

Extract specific section from LaTeX file with python

I have a set of LaTeX files. I would like to extract the "abstract" section for each one: \begin{abstract}.....\end{abstract}I have tried the suggestion here: How to Parse LaTex fileAnd tried…

Installing LXML, facing a legacy-install-failure error

Trying to install lxml on Python 311. Faced with this error. PS C:\Users\chharlie\Desktop\code> pip install lxml Collecting lxmlUsing cached lxml-4.9.1.tar.gz (3.4 MB)Preparing metadata (setup.py) .…

PyInstaller wont install, Python 3.6.0a4 and x64 Windows

I have said Python version (from https://www.python.org/downloads/windows/), and x64 Windows 10. Every time I try to execute "pip install pyinstaller" it crashes with an error:C:\WINDOWS\syst…

matplotlib figures disappearing between show() and savefig()

Ive kept a set of references to figures in a dictionary so that I could save them later if desired. I am troubled that the saved figures are blank if invoke a show() command and look at them first. S…

Regular expression that never finishes running

I wrote a small, naive regular expression that was supposed to find text inside parentheses:re.search(r\((.|\s)*\), name)I know this is not the best way to do it for a few reasons, but it was working j…

fatal error: Python.h: No such file or directory, python-Levenshtein install

Firstly, Im working on an Amazon EC2 instance, Amazon linux version 2 AMI using Python 3.7.Im trying to install the python-Levenshtein package using the command:pip3 install python-Levenshtein --useran…