I'm trying to parallelize some calculations that use `numpy`

with the help of Python's `multiprocessing`

module. Consider this simplified example:

```
import time
import numpyfrom multiprocessing import Pooldef test_func(i):a = numpy.random.normal(size=1000000)b = numpy.random.normal(size=1000000)for i in range(2000):a = a + bb = a - ba = a - breturn 1t1 = time.time()
test_func(0)
single_time = time.time() - t1
print("Single time:", single_time)n_par = 4
pool = Pool()t1 = time.time()
results_async = [pool.apply_async(test_func, [i])for i in range(n_par)]
results = [r.get() for r in results_async]
multicore_time = time.time() - t1print("Multicore time:", multicore_time)
print("Efficiency:", single_time / multicore_time)
```

When I execute it, the `multicore_time`

is roughly equal to `single_time * n_par`

, while I would expect it to be close to `single_time`

. Indeed, if I replace `numpy`

calculations with just `time.sleep(10)`

, this is what I get — perfect efficiency. But for some reason it does not work with `numpy`

. Can this be solved, or is it some internal limitation of `numpy`

?

Some additional info which may be useful:

I'm using OSX 10.9.5, Python 3.4.2 and the CPU is Core i7 with (as reported by the system info) 4 cores (although the above program only takes 50% of CPU time in total, so the system info may not be taking into account hyperthreading).

when I run this I see

`n_par`

processes in`top`

working at 100% CPUif I replace

`numpy`

array operations with a loop and per-index operations, the efficiency rises significantly (to about 75% for`n_par = 4`

).