Python child process silently crashes when issuing an HTTP request

2024/10/18 12:17:25

I'm running into an issue when combining multiprocessing, requests (or urllib2) and nltk. Here is a very simple code:

>>> from multiprocessing import Process
>>> import requests
>>> from pprint import pprint
>>> Process(target=lambda: pprint(requests.get('https://api.github.com'))).start()
>>> <Response [200]>  # this is the response displayed by the call to `pprint`.

A bit more details on what this piece of code does:

  1. Import a few required modules
  2. Start a child process
  3. Issue an HTTP GET request to 'api.github.com' from the child process
  4. Display the result

This is working great. The problem comes when importing nltk:

>>> import nltk
>>> Process(target=lambda: pprint(requests.get('https://api.github.com'))).start()
>>> # nothing happens!

After having imported NLTK, the requests actually silently crashes the thread (if you try with a named function instead of the lambda function, adding a few print statement before and after the call, you'll see that the execution stops right on the call to requests.get) Does anybody have any idea what in NLTK could explain such behavior, and how to get overcome the issue?

Here are the version I'm using:

$> python --version
Python 2.7.5
$> pip freeze | grep nltk
nltk==2.0.5
$> pip freeze | grep requests
requests==2.2.1

I'm running Mac OS X v. 10.9.5.

Thanks!

Answer

Updating your python libraries and python should resolve the problem:

alvas@ubi:~$ pip freeze | grep nltk
nltk==3.0.3
alvas@ubi:~$ pip freeze | grep requests
requests==2.7.0
alvas@ubi:~$ python --version
Python 2.7.6
alvas@ubi:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:    14.04
Codename:   trusty

From code:

from multiprocessing import Process
import nltk
import timedef child_fn():print "Fetch URL"import urllib2print urllib2.urlopen("https://www.google.com").read()[:100]print "Done"while True:child_process = Process(target=child_fn)child_process.start()child_process.join()print "Child process returned"time.sleep(1)

[out]:

Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned
Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned
Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned

From code:

alvas@ubi:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Process
>>> import requests
>>> from pprint import pprint
>>> Process(target=lambda: pprint(
...         requests.get('https://api.github.com'))).start()
>>> <Response [200]>>>> import nltk
>>> Process(target=lambda: pprint(
...         requests.get('https://api.github.com'))).start()
>>> <Response [200]>

It should work with python3 too:

alvas@ubi:~$ python3
Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Process
>>> import requests
>>> Process(target=lambda: print(requests.get('https://api.github.com'))).start()
>>> 
>>> <Response [200]>>>> import nltk
>>> Process(target=lambda: print(requests.get('https://api.github.com'))).start()
>>> <Response [200]>
https://en.xdnf.cn/q/72911.html

Related Q&A

Shared variable in concurrent.futures.ProcessPoolExecutor() python

I want to use parallel to update global variable using module concurrent.futures in pythonIt turned out that using ThreadPoolExecutor can update my global variable but the CPU did not use all their pot…

MongoEngine - Another user is already authenticated to this database. You must logout first

Can anyone please explain why I am getting error Another user is already authenticated to this database. You must logout first when connecting to MongoDB using Flask MongoEngine?from mongoengine.conne…

How to bucketize a group of columns in pyspark?

I am trying to bucketize columns that contain the word "road" in a 5k dataset. And create a new dataframe. I am not sure how to do that, here is what I have tried far : from pyspark.ml.featur…

Dictionary of tags in declarative SQLAlchemy?

I am working on a quite large code base that has been implemented using sqlalchemy.ext.declarative, and I need to add a dict-like property to one of the classes. What I need is the same as in this ques…

How to connect to a GObject signal in python, without it keeping a reference to the connecter?

The problem is basically this, in pythons gobject and gtk bindings. Assume we have a class that binds to a signal when constructed:class ClipboardMonitor (object):def __init__(self):clip = gtk.clipboar…

openpyxl please do not assume text as a number when importing

There are numerous questions about how to stop Excel from interpreting text as a number, or how to output number formats with openpyxl, but I havent seen any solutions to this problem:I have an Excel s…

NLTK CoreNLPDependencyParser: Failed to establish connection

Im trying to use the Stanford Parser through NLTK, following the example here.I follow the first two lines of the example (with the necessary import)from nltk.parse.corenlp import CoreNLPDependencyPars…

How to convert hex string to color image in python?

im new in programming so i have some question about converting string to color image.i have one data , it consists of Hex String, like a fff2f3..... i want to convert this file to png like this.i can c…

How to add values to a new column in pandas dataframe?

I want to create a new named column in a Pandas dataframe, insert first value into it, and then add another values to the same column:Something like:import pandasdf = pandas.DataFrame() df[New column].…

value error happens when using GridSearchCV

I am using GridSearchCV to do classification and my codes are:parameter_grid_SVM = {dual:[True,False],loss:["squared_hinge","hinge"],penalty:["l1","l2"] } clf = …