Speed-up a single task using multi-processing or threading

2024/10/9 14:18:57

Is it possible to speed up a single task using multi-processing/threading? My gut feeling is that the answer is 'no'. Here is an example of what I mean by a "single task":

for i in range(max):pick = random.choice(['on', 'off', 'both'])

With an argument of 10000000 it takes about 7.9 seconds to complete on my system.

I have a basic grasp of how to use multi-processing and threading for multiple tasks. For example, if I have 10 directories each one containing X number of files that need to be read, I could use create 10 threads.

I suspect that the single task is using only a single process (task manager reports CPU usage is minimal). Is there a way to leverage my other cores in such cases? Or is increasing the CPU/Memory speeds the only way to get faster results?

Answer

Here is a benchmark of your code with and without multiprocessing:

#!/usr/bin/env pythonimport random
import timedef test1():print "for loop with no multiproc: "m = 10000000t = time.time()for i in range(m):pick = random.choice(['on', 'off', 'both'])print time.time()-tdef test2():print "map with no multiproc: "m = 10000000t = time.time()map(lambda x: random.choice(['on', 'off', 'both']), range(m))print time.time()-tdef rdc(x):return random.choice(['on', 'off', 'both'])def test3():from multiprocessing import Poolpool = Pool(processes=4)m = 10000000print "map with multiproc: "t = time.time()r = pool.map(rdc, range(m))print time.time()-tif __name__ == "__main__":test1()test2()test3()

And here is the result on my workstation (which is a quadcore):

for loop with no multiproc: 
8.31032013893
map with no multiproc: 
9.48167610168
map with multiproc: 
4.94983720779

Is it possible to speed up a single task using multi-processing/threading? My gut feeling is that the answer is 'no'.

well, afaict, the answer is "damn, yes".

Is there a way to leverage my other cores in such cases? Or is increasing the CPU/Memory speeds the only way to get faster results?

yes, by using multiprocessing. Python can't handle multiple cores by using threading, because of the GIL, but it can rely on your operating system's scheduler to leverage the other cores. Then you can get a real improvement on your tasks.

https://en.xdnf.cn/q/70009.html

Related Q&A

Full outer join of two or more data frames

Given the following three Pandas data frames, I need to merge them similar to an SQL full outer join. Note that the key is multi-index type_N and id_N with N = 1,2,3:import pandas as pdraw_data = {type…

How can I add a level to a MultiIndex?

index = [np.array([foo, foo, qux]),np.array([a, b, a])] data = np.random.randn(3, 2) columns = ["X", "Y"] df = pd.DataFrame(data, index=index, columns=columns) df.index.names = [&qu…

decoupled frontend and backend with Django, webpack, reactjs, react-router

I am trying to decouple my frontend and my backend in my project. My frontend is made up of reactjs and routing will be done with react-router, My backend if made form Django and I plan to use the fron…

Map colors in image to closest member of a list of colors, in Python

I have a list of 19 colors, which is a numpy array of size (19,3):colors = np.array([[0, 0, 0], [0, 0, 255], [255, 0, 0], [150, 30, 150], [255, 65, 255], [150, 80, 0], [170, 120, 65], [125, 125,…

Storing a file in the clipboard in python

Is there a way to use the win32clipboard module to store a reference to a file in the windows clipboard in python. My goal is to paste an image in a way that allows transparency. If I drag and drop a…

retrieve intermediate features from a pipeline in Scikit (Python)

I am using a pipeline very similar to the one given in this example : >>> text_clf = Pipeline([(vect, CountVectorizer()), ... (tfidf, TfidfTransformer()), ... …

Any way to do integer division in sympy?

I have a very long expression that I think can be simplified, and I thought sympy would be the perfect way to do it. Unfortunately the formula relies on a couple of integer divides, and I cant find any…

Scrapy LinkExtractor - Limit the number of pages crawled per URL

I am trying to limit the number of crawled pages per URL in a CrawlSpider in Scrapy. I have a list of start_urls and I want to set a limit on the numbers pages are being crawled in each URL. Once the l…

Python Invalid format string [duplicate]

This question already has answers here:Python time formatting different in Windows(3 answers)Closed 9 years ago.I am trying to print the date in the following format using strftime: 06-03-2007 05:40PMI…

Python template safe substitution with the custom double-braces format

I am trying to substitute variables in the format {{var}} with Pythons Template. from string import Templateclass CustomTemplate(Template):delimiter = {{pattern = r\{\{(?:(?P<escaped>\{\{)|(?P…