I'm trying to parallelize some calculations that use numpy
with the help of Python's multiprocessing
module. Consider this simplified example:
import time
import numpyfrom multiprocessing import Pooldef test_func(i):a = numpy.random.normal(size=1000000)b = numpy.random.normal(size=1000000)for i in range(2000):a = a + bb = a - ba = a - breturn 1t1 = time.time()
test_func(0)
single_time = time.time() - t1
print("Single time:", single_time)n_par = 4
pool = Pool()t1 = time.time()
results_async = [pool.apply_async(test_func, [i])for i in range(n_par)]
results = [r.get() for r in results_async]
multicore_time = time.time() - t1print("Multicore time:", multicore_time)
print("Efficiency:", single_time / multicore_time)
When I execute it, the multicore_time
is roughly equal to single_time * n_par
, while I would expect it to be close to single_time
. Indeed, if I replace numpy
calculations with just time.sleep(10)
, this is what I get — perfect efficiency. But for some reason it does not work with numpy
. Can this be solved, or is it some internal limitation of numpy
?
Some additional info which may be useful:
I'm using OSX 10.9.5, Python 3.4.2 and the CPU is Core i7 with (as reported by the system info) 4 cores (although the above program only takes 50% of CPU time in total, so the system info may not be taking into account hyperthreading).
when I run this I see
n_par
processes intop
working at 100% CPUif I replace
numpy
array operations with a loop and per-index operations, the efficiency rises significantly (to about 75% forn_par = 4
).