Setting NLTK with Stanford NLP (both StanfordNERTagger and StanfordPOSTagger) for Spanish

2024/10/7 0:25:11

The NLTK documentation is rather poor in this integration. The steps I followed were:

  • Download to /home/me/stanford

  • Download to /home/me/stanford

Then in a ipython console:

In [11]: import nltk

In [12]: nltk.__version__
Out[12]: '3.1'In [13]: from nltk.tag import StanfordNERTagger


st = StanfordNERTagger('/home/me/stanford/', '/home/me/stanford/stanford-spanish-corenlp-2015-01-08-models.jar')

But when I tried to run it:

st.tag('Adolfo se la pasa corriendo'.split())
Error: no se ha encontrado o cargado la clase principal
OSError                                   Traceback (most recent call last)
<ipython-input-14-0c1a96b480a6> in <module>()
----> 1 st.tag('Adolfo se la pasa corriendo'.split())/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/ in tag(self, tokens)64     def tag(self, tokens):65         # This function should return list of tuple rather than list of list
---> 66         return sum(self.tag_sents([tokens]), [])67 68     def tag_sents(self, sentences):/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/ in tag_sents(self, sentences)87         # Run the tagger and get the output88         stanpos_output, _stderr = java(cmd, classpath=self._stanford_jar,
---> 89                                                        stdout=PIPE, stderr=PIPE)90         stanpos_output = stanpos_output.decode(encoding)91 /home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/ in java(cmd, classpath, stdin, stdout, stderr, blocking)132     if p.returncode != 0:133         print(_decode_stdoutdata(stderr))
--> 134         raise OSError('Java command failed : ' + str(cmd))135 136     return (stdout, stderr)OSError: Java command failed : ['/usr/bin/java', '-mx1000m', '-cp', '/home/nanounanue/Descargas/stanford-spanish-corenlp-2015-01-08-models.jar', '', '-loadClassifier', '/home/nanounanue/Descargas/', '-textFile', '/tmp/tmp6y169div', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']

The same occur with the StandfordPOSTagger

NOTE: I need that this will be the spanish version. NOTE: I am running this in python 3.4.3



# StanfordPOSTagger
from nltk.tag.stanford import StanfordPOSTagger
stanford_dir = '/home/me/stanford/stanford-postagger-full-2015-04-20/'
modelfile = stanford_dir + 'models/english-bidirectional-distsim.tagger'
jarfile = stanford_dir + 'stanford-postagger.jar'st = StanfordPOSTagger(model_filename=modelfile, path_to_jar=jarfile)# NERTagger
stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
jarfile = stanford_dir + 'stanford-ner.jar'
modelfile = stanford_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz'st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)

For detailed information on NLTK API with Stanford tools, take a look at:

Note: The NLTK APIs are for the individual Stanford tools, if you're using Stanford Core NLP, it's best to follow @dimazest instructions on


As for Spanish NER Tagging, I strongly suggest that you us Stanford Core NLP ( instead of using the Stanford NER package ( And follow @dimazest solution for JSON file reading.

Alternatively, if you must use the NER packge, you can try following the instructions from (Disclaimer: This repo is not affiliated with NLTK officially). Do the following on the unix command line:

cd $HOME
unzip stanford-spanish-corenlp-2015-01-08-models.jar -d stanford-spanish
cp stanford-spanish/edu/stanford/nlp/models/ner/* /home/me/stanford/stanford-ner-2015-04-20/ner/classifiers/

Then in python:

# NERTagger
stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
jarfile = stanford_dir + 'stanford-ner.jar'
modelfile = stanford_dir + 'classifiers/spanish.ancora.distsim.s512.crf.ser.gz'st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)

