How to keep track of status with multiprocessing and pool.map?

2024/9/30 17:24:42

I'm setting up a multiprocessing module for the first time, and basically, I am planning to do something along the lines of

from multiprocessing import pool
pool = Pool(processes=102)
results = pool.map(whateverFunction, myIterable)
print 1

As I understand it, 1 will be printed as soon as all the processes have come back and results is complete. I would like to have some status update on these. What is the best way of implementing that?

I'm kind of hesitant of making whateverFunction() print. Especially if there's around 200 values, I'm going to have something like 'process done' printed 200 times, which is not very useful.

I expect output like

10% of myIterable done
20% of myIterable done
Answer

pool.map blocks until all the concurrent function calls have completed. pool.apply_async does not block. Moreover, you could use its callback parameter to report on progress. The callback function, log_result, is called once each time foo completes. It is passed the value returned by foo.

from __future__ import division
import multiprocessing as mp
import timedef foo(x):time.sleep(0.1)return xdef log_result(retval):results.append(retval)if len(results) % (len(data)//10) == 0:print('{:.0%} done'.format(len(results)/len(data)))if __name__ == '__main__':pool = mp.Pool()results = []data = range(200)for item in data:pool.apply_async(foo, args=[item], callback=log_result)pool.close()pool.join()print(results)

yields

10% done
20% done
30% done
40% done
50% done
60% done
70% done
80% done
90% done
100% done
[0, 1, 2, 3, ..., 197, 198, 199]

The log_result function above modifies the global variable results and accesses the global variable data. You can not pass these variables to log_result because the callback function specified in pool.apply_async is always called with exactly one argument, the return value of foo.

You can, however, make a closure, which at least makes clear what variables log_result depends on:

from __future__ import division
import multiprocessing as mp
import timedef foo(x):time.sleep(0.1)return xdef make_log_result(results, len_data):def log_result(retval):results.append(retval)if len(results) % (len_data//10) == 0:print('{:.0%} done'.format(len(results)/len_data))return log_resultif __name__ == '__main__':pool = mp.Pool()results = []data = range(200)for item in data:pool.apply_async(foo, args=[item], callback=make_log_result(results, len(data)))pool.close()pool.join()print(results)
https://en.xdnf.cn/q/71055.html

Related Q&A

How to get time 17:00:00 today or yesterday?

If 17:00:00 today is already passed, then it should be todays date, otherwise - yesterdays. Todays time I get with:test = datetime.datetime.now().replace(hour=17,minute=0,second=0,microsecond=0)But I d…

PyMongo Aggregate how to get executionStats

I am trying to get executionStats of a Particular mongo aggregate query. I run db.command but that doesnt give "execution status"This is what I am trying to do. how to get Python Mongo Aggreg…

Is it possible to do parallel reads on one h5py file using multiprocessing?

I am trying to speed up the process of reading chunks (load them into RAM memory) out of a h5py dataset file. Right now I try to do this via the multiprocessing library. pool = mp.Pool(NUM_PROCESSES) g…

Where is a django validator functions return value stored?

In my django app, this is my validator.py from django.core.exceptions import ValidationError from django.core.validators import URLValidatordef validate_url(value):url_validator = URLValidator()url_inv…

Modifying YAML using ruamel.yaml adds extra new lines

I need to add an extra value to an existing key in a YAML file. Following is the code Im using.with open(yaml_in_path, r) as f:doc, ind, bsi = load_yaml_guess_indent(f, preserve_quotes=True) doc[phase1…

How to get the background color of a button or label (QPushButton, QLabel) in PyQt

I am quite new to PyQt. Does anyone tell me how to get the background color of a button or label (QPushButton, QLabel) in PyQt.

Is it possible to make sql join on several fields using peewee python ORM?

Assuming we have these three models.class Item(BaseModel):title = CharField()class User(BaseModel):name = CharField()class UserAnswer(BaseModel):user = ForeignKeyField(User, user_answers)item = Foreign…

Django multiple form factory

What is the best way to deal with multiple forms? I want to combine several forms into one. For example, I want to combine ImangeFormSet and EntryForm into one form:class ImageForm(forms.Form):image =…

How to include the private key in paramiko after fetching from string?

I am working with paramiko, I have generated my private key and tried it which was fine. Now I am working with Django based application where I have already copied the private key in database.I saved m…

SHA 512 crypt output written with Python code is different from mkpasswd

Running mkpasswd -m sha-512 -S salt1234 password results in the following:$6$salt1234$Zr07alHmuONZlfKILiGKKULQZaBG6Qmf5smHCNH35KnciTapZ7dItwaCv5SKZ1xH9ydG59SCgkdtsTqVWGhk81I have this snippet of Python…