Use of initialize in python multiprocessing worker pool

2024/9/27 12:12:01

I was looking into the multiprocessing.Pool for workers, trying to initialize workers with some state. The pool can take a callable, initialize, but it isn't passed a reference to the initialized worker. The few example that I've seen utilize it call global variables, which seems really nasty.

Is there any good way to initialize worker state using multiprocessing.Pool?

Edit: An example:

I have workers, each of which do a bit relatively expensive initialisation (binding to a socket), which I don't want to have to do every time. I could initialize my sockets by hand, then pass them in when I assign work, but sharing file descriptors across processes is complicated, if not impossible. So I would have to initialize and bind every time I wanted to process a request.

Answer

Technically speaking, the right thing to do would be having the result of the initialization function passed as argument to every function executed by the worker.

It's also true that in this context is fine and safe to have global variables, since by construction they result private objects living in the separate domains of different processes.

My general suggestion is to build functions with a sane reentrant programming style, and to allow global variables while exploiting the multiprocessing functionality.

Keeping your example, the following send function requires some context (in this case, a socket):

def send(socket, data):pass # ... your code herereturn dust

The initialization code and the base code executed by the worker will rely on global variables for convenience.

socket = None
def init(address, port):global socketsocket = magic(address, port)def job(data):global socketassert socket is not Nonereturn send(socket, data)pool = multithreading.Pool(N, init, [address, port])
pool.map(job, ['foo', 'bar', 'baz'])

By coding it in this way it gets simple and natural to test it without multiprocessing. You can think of your global state as a perfectly safe context capsule.

As additional point of convenience, keep in mind that multiprocessing is not very good at sending complex data around (e.g. callbacks). The best approach is sending simple pieces of data (strings, lists, dictionaries, collections.namedtuple ...) and reconstruct the complex data structures on the worker side (using the initialization function).

https://en.xdnf.cn/q/71458.html

Related Q&A

Pandas: select the first couple of rows in each group

I cant solve this simple problem and Im asking for help here... I have DataFrame as follows and I want to select the first two rows in each group of adf = pd.DataFrame({a:pd.Series([NewYork,NewYork,New…

Pandas: Approximate join on one column, exact match on other columns

I have two pandas dataframes I want to join/merge exactly on a number of columns (say 3) and approximately, i.e nearest neighbour, on one (date) column. I also want to return the difference (days) betw…

Adding a variable in Content disposition response file name-python/django

I am looking to add a a variable into the file name section of my below python code so that the downloaded files name will change based on a users input upon download. So instead of "Data.xlsx&quo…

TkInter: understanding unbind function

Does TkInter unbind function prevents the widget on which it is applied from binding further events to the widget ?Clarification:Lets say I bound events to a canvas earlier in a prgram:canvas.bind(&qu…

Dynamically get dict elements via getattr?

I want to dynamically query which objects from a class I would like to retrieve. getattr seems like what I want, and it performs fine for top-level objects in the class. However, Id like to also specif…

How do I copy an image from the output in Jupyter Notebook 7+?

Ive been working with Jupyter Notebooks for quite a while. When working with visualisations, I like to copy the output image from a cell by right clicking the image and selecting "Copy Image"…

How to join 2 dataframe on year and month in Pandas?

I have 2 dataframe and I want to join them on the basis of month and year from a date without creating extra columns:example :df1 :date_1 value_1 2017-1-15 20 2017-1-31 30 2016-2-15 20df2…

Sorting Python Dictionary based on Key? [duplicate]

This question already has answers here:How do I sort a dictionary by key?(33 answers)Closed 10 years ago.I have created a python dictionary which has keys in this form :11, 10, 00, 01, 20, 21, 31, 30T…

Flask: Template in Blueprint Inherit from Template in App?

Im a total Flask/Jinja2 newbie, so maybe Im overlooking something obvious, but:Shouldnt Flask, out of the box, allow a template that exists in a blueprints templates/ folder to extend a base template d…

Equivalent of python2 chr(int) in python3

# python2 print(chr(174)) ?# python3 print(chr(174)) Im looking for the equivalent of chr() from python2. I believe this is due to python 3 returning unicode characters rather than ASCII.