NLTK Named Entity Recognition with Custom Data

2024/11/20 10:28:59

I'm trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. I've been trying to find a way to train my own NER, but I don't seem to be able to find the right resources. I have a couple of questions regarding NLTK-

  1. Can I use my own data to train an Named Entity Recognizer in NLTK?
  2. If I can train using my own data, is the named_entity.py the file to be modified?
  3. Does the input file format have to be in IOB eg. Eric NNP B-PERSON ?
  4. Are there any resources - apart from the nltk cookbook and nlp with python that I can use?

I would really appreciate help in this regard

Answer

Are you committed to using NLTK/Python? I ran into the same problems as you, and had much better results using Stanford's named-entity recognizer: http://nlp.stanford.edu/software/CRF-NER.shtml. The process for training the classifier using your own data is very well-documented in the FAQ.

If you really need to use NLTK, I'd hit up the mailing list for some advice from other users: http://groups.google.com/group/nltk-users.

Hope this helps!

https://en.xdnf.cn/q/26333.html

Related Q&A

How do I write to the console in Google App Engine?

Often when I am coding I just like to print little things (mostly the current value of variables) out to console. I dont see anything like this for Google App Engine, although I note that the Google Ap…

Does Google App Engine support Python 3?

I started learning Python 3.4 and would like to start using libraries as well as Google App Engine, but the majority of Python libraries only support Python 2.7 and the same with Google App Engine.Shou…

how to subquery in queryset in django?

how can i have a subquery in djangos queryset? for example if i have:select name, age from person, employee where person.id = employee.id and employee.id in (select id from employee where employee.com…

Opening sqlite3 database from python in read-only mode

While using sqlite3 from C/C++ I learned that it has a open-in-read-only mode option, which is very handy to avoid accidental data-corruption. Is there such a thing in the Python binding?

SyntaxError: Generator expression must be parenthesized

I just installed django and after installing that I created a django project and was trying to run django server by command:python manage.py runserverAfter that Iam getting error as: SyntaxError: Gene…

ValueError: max() arg is an empty sequence

Ive created a GUI using wxFormBuilder that should allow a user to enter the names of "visitors to a business" into a list and then click one of two buttons to return the most frequent and lea…

How to write native newline character to a file descriptor in Python?

The os.write function can be used to writes bytes into a file descriptor (not file object). If I execute os.write(fd, \n), only the LF character will be written into the file, even on Windows. I would …

pandas combine two columns with null values

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. H…

Python equivalent of zip for dictionaries

If I have these two lists:la = [1, 2, 3] lb = [4, 5, 6]I can iterate over them as follows:for i in range(min(len(la), len(lb))):print la[i], lb[i]Or more pythonicallyfor a, b in zip(la, lb):print a, bW…

Can I prevent fabric from prompting me for a sudo password?

I am using Fabric to run commands on a remote server. The user with which I connect on that server has some sudo privileges, and does not require a password to use these privileges. When SSHing into th…