I have a list of random numbers and I would like to get the greatest number using multiprocessing.
This is the code I used to generate the list:
import random
randomlist = []
for i in range(100000000):n = random.randint(1,30000000)randomlist.append(n)
To get the greatest number using a serial process:
import timegreatest = 0 # global variabledef f(n):global greatestif n>greatest:greatest = nif __name__ == "__main__":global greatestt2 = time.time()greatest = 0for x in randomlist:f(x) print("serial process took:", time.time()-t2)print("greatest = ", greatest)
This is my try to get the greatest number using multiprocessing:
from multiprocessing import Pool
import timegreatest = 0 # the global variabledef f(n):global greatestif n>greatest:greatest = nif __name__ == "__main__":global greatestgreatest = 0t1 = time.time()p = Pool() #(processes=3) result = p.map(f,randomlist)p.close()p.join()print("pool took:", time.time()-t1)print("greatest = ", greatest)
The output here is 0. It is clear that there is no global variable. How can I fix this without affecting the performance?
As suggested by @Barmar, divide your randomlist
into chunk then process local maximum from each chunk and finally compute global maximum from local_maximum_list
:
import multiprocessing as mp
import numpy as np
import random
import timeCHUNKSIZE = 10000def local_maximum(l):m = max(l)print(f"Local maximum: {m}")return mif __name__ == '__main__':randomlist = np.random.randint(1, 30000000, 100000000)start = time.time()chunks = (randomlist[i:i+CHUNKSIZE]for i in range(0, len(randomlist), CHUNKSIZE))with mp.Pool(mp.cpu_count()) as pool:local_maximum_list = pool.map(local_maximum, chunks)print(f"Global maximum: {max(local_maximum_list)}")end = time.time()print(f"MP Elapsed time: {end-start:.2f}s")
Performance
It's very interesting how the creation of the random list impacts the performance of multiprocessing
Scenario 1:
randomlist = np.random.randint(1, 30000000, 100000000)
MP Elapsed time: 1.63sScenario 2:
randomlist = np.random.randint(1, 30000000, 100000000).tolist()
MP Elapsed time: 6.02sScenario 3
randomlist = [random.randint(1, 30000000) for _ in range(100000000)]
MP Elapsed time: 7.14sScenario 4:
randomlist = list(np.random.randint(1, 30000000, 100000000))
MP Elapsed time: 184.28sScenario 5:
randomlist = []
for _ in range(100000000):n = random.randint(1, 30000000)randomlist.append(n)
MP Elapsed time: 7.52s