I was looking into the multiprocessing.Pool for workers, trying to initialize workers with some state. The pool can take a callable, initialize, but it isn't passed a reference to the initialized worker. The few example that I've seen utilize it call global variables, which seems really nasty.
Is there any good way to initialize worker state using multiprocessing.Pool?
Edit: An example:
I have workers, each of which do a bit relatively expensive initialisation (binding to a socket), which I don't want to have to do every time. I could initialize my sockets by hand, then pass them in when I assign work, but sharing file descriptors across processes is complicated, if not impossible. So I would have to initialize and bind every time I wanted to process a request.
Technically speaking, the right thing to do would be having the result of the initialization function passed as argument to every function executed by the worker.
It's also true that in this context is fine and safe to have global variables, since by construction they result private objects living in the separate domains of different processes.
My general suggestion is to build functions with a sane reentrant programming style, and to allow global variables while exploiting the multiprocessing
functionality.
Keeping your example, the following send
function requires some context (in this case, a socket):
def send(socket, data):pass # ... your code herereturn dust
The initialization code and the base code executed by the worker will rely on global variables for convenience.
socket = None
def init(address, port):global socketsocket = magic(address, port)def job(data):global socketassert socket is not Nonereturn send(socket, data)pool = multithreading.Pool(N, init, [address, port])
pool.map(job, ['foo', 'bar', 'baz'])
By coding it in this way it gets simple and natural to test it without multiprocessing. You can think of your global state as a perfectly safe context capsule.
As additional point of convenience, keep in mind that multiprocessing
is not very good at sending complex data around (e.g. callbacks). The best approach is sending simple pieces of data (strings, lists, dictionaries, collections.namedtuple
...) and reconstruct the complex data structures on the worker side (using the initialization function).