How to obtain better results using NLTK pos tag

2024/10/3 6:35:37

I am just learning nltk using Python. I tried doing pos_tag on various sentences. But the results obtained are not accurate. How can I improvise the results ?

broke = NN
flimsy = NN
crap = NN

Also I am getting lot of extra words being categorized as NN. How can I filter these out to get better results.?

Answer

Give the context, there you obtained these results. Just as example, I'm obtaining other results with pos_tag on the context phrase "They broke climsy crap":

import nltk
text=nltk.word_tokenize("They broke flimsy crap")
nltk.pos_tag(text)

[('They', 'PRP'), ('broke', 'VBP'), ('flimsy', 'JJ'), ('crap', 'NN')]

Anyway, if you see that in your opinion a lot of word are falsely cathegorized as 'NN', you can apply some other technique specially on those which are marked a s 'NN'. For instance, you can take some appropriate tagged corpora and classify it with trigram tagger. (actually in the same way the authors do it with bigrams on http://nltk.googlecode.com/svn/trunk/doc/book/ch05.html).

Something like this:

pos_tag_results=nltk.pos_tag(your_text) #tagged sentences with pos_tag
trigram_tagger=nltk.TrigramTagger(tagged_corpora) #build trigram tagger based on your tagged_corpora
trigram_tag_results=trigram_tagger(your_text) #tagged sentences with trigram tagger
for i in range(0,len(pos_tag_results)):if pos_tag_results[i][1]=='NN':pos_tag_results[i][1]=trigram_tag_results[i][1]#for 'NN' take trigram_tagger instead

Let me know if it improves your results.

https://en.xdnf.cn/q/70759.html

Related Q&A

Pandas apply on rolling with multi-column output

I am working on a code that would apply a rolling window to a function that would return multiple columns. Input: Pandas Series Expected output: 3-column DataFrame def fun1(series, ):# Some calculation…

Exceptions for the whole class

Im writing a program in Python, and nearly every method im my class is written like this: def someMethod(self):try:#...except someException:#in case of exception, do something here#e.g display a dialog…

Getting live output from asyncio subprocess

Im trying to use Python asyncio subprocesses to start an interactive SSH session and automatically input the password. The actual use case doesnt matter but it helps illustrate my problem. This is my c…

multi language support in python script

I have a large python (2.7) script that reads data from a database and generate pictures in pdf format. My pictures have strings for labels, etc... Now I want to add a multi language support for the sc…

Add date tickers to a matplotlib/python chart

I have a question that sounds simple but its driving me mad for some days. I have a historical time series closed in two lists: the first list is containing prices, lets say P = [1, 1.5, 1.3 ...] while…

Python Selenium: Cant find element by xpath when browser is headless

Im attempting to log into a website using Python Selenium using the following code:import time from contextlib import contextmanager from selenium import webdriver from selenium.webdriver.chrome.option…

Reading large file in Spark issue - python

I have spark installed in local, with python, and when running the following code:data=sc.textFile(C:\\Users\\xxxx\\Desktop\\train.csv) data.first()I get the following error:---------------------------…

pyinstaller: 2 instances of my cherrypy app exe get executed

I have a cherrypy app that Ive made an exe with pyinstaller. now when I run the exe it loads itself twice into memory. Watching the taskmanager shows the first instance load into about 1k, then a seco…

python - Dataframes with RangeIndex vs.Int64Index - Why?

EDIT: I have just found a line in my code that changes my df from a RangeIndex to a numeric Int64Index. How and why does this happen?Before this line all my df are type RangeIndex. After this line of …

Uniform Circular LBP face recognition implementation

I am trying to implement a basic face recognition system using Uniform Circular LBP (8 Points in 1 unit radius neighborhood). I am taking an image, re-sizing it to 200 x 200 pixels and then splitting …