How to assign python requests sessions for single processes in multiprocessing pool?

2024/9/23 15:31:48

Considering the following code example:

import multiprocessing
import requestssession = requests.Session()
data_to_be_processed = [...]def process(arg):# do stuff with arg and get urlresponse = session.get(url)# process response and generate data...return datawith multiprocessing.Pool() as pool:results = pool.map(process, data_to_be_processed)

In example, Session is assigned as global variable, therefore after creating processes in Pool it will be copied into each subprocess. I am not sure whether the session is thread safe nor how pooling in session works, so I would like to assign separate session object for each process in pool.

I am aware, that I could just use requests.get(url) instead of session.get(url), but I would like to work with session and I am also considering using requests-html (https://html.python-requests.org/).

I am not very familiar with python's multiprocessing, so far I have used just pool, because it came to me as best solution to process data in parallel without having a critical section, so I am open for different solutions.

Is there a way to do it clean and straightforward?

Answer

Short answer: you can use global namespace for sharing data between initializer and func:

import multiprocessing
import requestssession = None
data_to_be_processed = [...]def init_process():global sessionsession = requests.Session()def process(arg):global session# do stuff with arg and get urlresponse = session.get(url)# process response and generate data...return datawith multiprocessing.Pool(initializer=init_process) as pool:results = pool.map(process, data_to_be_processed)

Long answer: Python uses one of three possible start methods. All of them separate memory objects between parent process and child processes. In our case that means changes in global namespace of processes run by Pool() will not propagate back to parent process, neither to sibling processes.

For object destruction we could rely to Garbage Collector, which steps in once child process finishes it's work. Absence of explicit closing method in multiprocessing.Pool() makes it impossible to use with objects which are not destructible by GC (like the Pool() itself - see warning here ) Judging from requests docs, it is perfectly ok to use requests.Session without explicit close() on it.

https://en.xdnf.cn/q/71811.html

Related Q&A

Missing values in Pandas Pivot table?

I have a data set that looks like the following:student question answer number Bob How many donuts in a dozen? A 1 Sally How many donuts in a do…

Selecting Element followed by text with Selenium WebDriver

I am using Selenium WebDriver and the Python bindings to automate some monotonous WordPress tasks, and it has been pretty straightforward up until this point. I am trying to select a checkbox, but the …

AttributeError: module keras.backend has no attribute image_dim_ordering

I tried to execute some tutorial transfer learning project. But Ive got attribute error.I checked my tensorflow and keras version.tensorflow : 1.14.0 keras : 2.2.5and python 3.6.9 version.the code is h…

Python Interpreter String Pooling Optimization [duplicate]

This question already has answers here:What determines which strings are interned and when? [duplicate](3 answers)Closed 6 years ago.After seeing this question and its duplicate a question still remai…

Flattening an array in pandas

One of the columns in DataFrame is an array. How do I flatten it? column1 column2 column3 var1 var11 [1, 2, 3, 4] var2 var22 [1, 2, 3, 4, -2, 12] var3 var33 [1, 2, 3, 4, 33, 544]Afte…

Difficulty in using sympy solver in python

Please run the following codefrom sympy.solvers import solvefrom sympy import Symbolx = Symbol(x)R2 = solve(-109*x**5/3870720+4157*x**4/1935360-3607*x**3/69120+23069*x**2/60480+5491*x/2520+38-67,x)prin…

Add custom html between two model fields in Django admins change_form

Lets say Ive two models:class Book(models.Model):name = models.CharField(max_length=50)library = models.ForeignKeyField(Library)class Library(models.Model):name = models.CharField(max_length=50) addr…

Plotly: How to add a horizontal scrollbar to a plotly express figure?

Im beginning to learn more about plotly and pandas and have a multivariate time series I wish to plot and interact with using plotly.express features. I also want my plot to a horizontal scrollbar so t…

How to run script in Pyspark and drop into IPython shell when done?

I want to run a spark script and drop into an IPython shell to interactively examine data. Running both:$ IPYTHON=1 pyspark --master local[2] myscript.pyand$ IPYTHON=1 spark-submit --master local[2] my…

Finding Min/Max Date with List Comprehension in Python

So I have this list:snapshots = [2014-04-05,2014-04-06,2014-04-07,2014-04-08,2014-04-09]I would like to find the earliest date using a list comprehension.Heres what I have now, earliest_date = snapshot…