NLTK: Package Errors? punkt and pickle?

2024/10/8 4:27:35

Errors on Command Prompt

Basically, I have no idea why I'm getting this error.

Just to have more than an image, here is a similar message in code format. As it is more recent, the answer of this thread has already been mentioned in the message:

Preprocessing raw texts ...---------------------------------------------------------------------------LookupError                               Traceback (most recent call last)<ipython-input-38-263240bbee7e> in <module>()
----> 1 main()7 frames<ipython-input-32-62fa346501e8> in main()32     data = data.fillna('')  # only the comments has NaN's33     rws = data.abstract
---> 34     sentences, token_lists, idx_in = preprocess(rws, samp_size=samp_size)35     # Define the topic model object36     #tm = Topic_Model(k = 10), method = TFIDF)<ipython-input-31-f75213289788> in preprocess(docs, samp_size)25     for i, idx in enumerate(samp):26         sentence = preprocess_sent(docs[idx])
---> 27         token_list = preprocess_word(sentence)28         if token_list:29             idx_in.append(idx)<ipython-input-29-eddacbfa6443> in preprocess_word(s)179     if not s:180         return None
--> 181     w_list = word_tokenize(s)182     w_list = f_punct(w_list)183     w_list = f_noun(w_list)/usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in word_tokenize(text, language, preserve_line)126     :type preserver_line: bool127     """
--> 128     sentences = [text] if preserve_line else sent_tokenize(text, language)129     return [token for sent in sentences130             for token in _treebank_word_tokenizer.tokenize(sent)]/usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in sent_tokenize(text, language)92     :param language: the model name in the Punkt corpus93     """
---> 94     tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))95     return tokenizer.tokenize(text)96 /usr/local/lib/python3.7/dist-packages/nltk/data.py in load(resource_url, format, cache, verbose, logic_parser, fstruct_reader, encoding)832 833     # Load the resource.
--> 834     opened_resource = _open(resource_url)835 836     if format == 'raw':/usr/local/lib/python3.7/dist-packages/nltk/data.py in _open(resource_url)950 951     if protocol is None or protocol.lower() == 'nltk':
--> 952         return find(path_, path + ['']).open()953     elif protocol.lower() == 'file':954         # urllib might not use mode='rb', so handle this one ourselves:/usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths)671     sep = '*' * 70672     resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 673     raise LookupError(resource_not_found)674 675 LookupError: 
**********************************************************************Resource punkt not found.Please use the NLTK Downloader to obtain the resource:>>> import nltk>>> nltk.download('punkt')Searched in:- '/root/nltk_data'- '/usr/share/nltk_data'- '/usr/local/share/nltk_data'- '/usr/lib/nltk_data'- '/usr/local/lib/nltk_data'- '/usr/nltk_data'- '/usr/lib/nltk_data'- ''
**********************************************************************
Answer

Perform the following:

>>> import nltk
>>> nltk.download()

Then when you receive a window popup, select punkt under the identifier column which is locatedin the Module tab.

enter image description here

https://en.xdnf.cn/q/70158.html

Related Q&A

Is there a bit-wise trick for checking the divisibility of a number by 2 or 3?

I am looking for a bit-wise test equivalent to (num%2) == 0 || (num%3) == 0.I can replace num%2 with num&1, but Im still stuck with num%3 and with the logical-or.This expression is also equivalent …

Check image urls using python-markdown

On a website Im creating Im using Python-Markdown to format news posts. To avoid issues with dead links and HTTP-content-on-HTTPS-page problems Im requiring editors to upload all images to the site and…

How to unittest command line arguments?

I am trying to supply command line arguments to Python unittest and facing some issues. I have searched on internet and found a way to supply arguments asunittest.main(argv=[myArg])The issue is this wo…

different foreground colors for each line in wxPython wxTextCtrl

I have a multilinewx.TextCtrl()object which I set its forground and Background colors for writing strings.I need to write different lines with different colors ,wx.TextCtrl.setForgroundcolor()changes a…

Access deprecated attribute validation_data in tf.keras.callbacks.Callback

I decided to switch from keras to tf.keras (as recommended here). Therefore I installed tf.__version__=2.0.0 and tf.keras.__version__=2.2.4-tf. In an older version of my code (using some older Tensorfl…

How to unpickle a file that has been hosted in a web URL in python

The normal way to pickle and unpickle an object is as follows:Pickle an object:import cloudpickle as cpcp.dump(objects, open("picklefile.pkl", wb))UnPickle an object: (load the pickled file):…

Control tick-labels from multi-level FactorRange

Ive got a three-level bokeh.models.FactorRange which I use to draw tick labels on a vbar-plot. The problem is that there are dozens of factors in total and the lowest-level labels get very cramped.I ca…

PyTorch torch_sparse installation without CUDA

I am new in PyTorch and I have faced one issue, namely I cannot get my torch_sparse module properly installed. In general, I wanted to use module torch_geometric - this I have installed. However, when …

Escaping XPath literal with Python

Im writing a common library to setup an automation test suite with Selenium 2.0 Pythons webdriver.def verify_error_message_present(self, message):try:self.driver.find_element_by_xpath("//span[@cla…

How to return two values in cython cdef without gil (nogil)

I have a function and I am trying to return a number and a vector of ints. What I have is cdef func() nogil:cdef vector[int] vectcdef int a_number...return a_number, vectbut this will give errors like …