Inefficient multiprocessing of numpy-based calculations

2024/9/8 11:06:53

I'm trying to parallelize some calculations that use numpy with the help of Python's multiprocessing module. Consider this simplified example:

import time
import numpyfrom multiprocessing import Pooldef test_func(i):a = numpy.random.normal(size=1000000)b = numpy.random.normal(size=1000000)for i in range(2000):a = a + bb = a - ba = a - breturn 1t1 = time.time()
test_func(0)
single_time = time.time() - t1
print("Single time:", single_time)n_par = 4
pool = Pool()t1 = time.time()
results_async = [pool.apply_async(test_func, [i])for i in range(n_par)]
results = [r.get() for r in results_async]
multicore_time = time.time() - t1print("Multicore time:", multicore_time)
print("Efficiency:", single_time / multicore_time)

When I execute it, the multicore_time is roughly equal to single_time * n_par, while I would expect it to be close to single_time. Indeed, if I replace numpy calculations with just time.sleep(10), this is what I get — perfect efficiency. But for some reason it does not work with numpy. Can this be solved, or is it some internal limitation of numpy?

Some additional info which may be useful:

  • I'm using OSX 10.9.5, Python 3.4.2 and the CPU is Core i7 with (as reported by the system info) 4 cores (although the above program only takes 50% of CPU time in total, so the system info may not be taking into account hyperthreading).

  • when I run this I see n_par processes in top working at 100% CPU

  • if I replace numpy array operations with a loop and per-index operations, the efficiency rises significantly (to about 75% for n_par = 4).

Answer

It looks like the test function you're using is memory bound. That means that the run time you're seeing is limited by how fast the computer can pull the arrays from memory into cache. For example, the line a = a + b is actually using 3 arrays, a, b and a new array that will replace a. These three arrays are about 8MB each (1e6 floats * 8 bytes per floats). I believe the different i7s have something like 3MB - 8MB of shared L3 cache so you cannot fit all 3 arrays in cache at once. Your cpu adds the floats faster than the array can be loaded into cache so most of the time is spent waiting on the array to be read from memory. Because the cache is shared between the cores, you don't see any speedup by spreading the work onto multiple cores.

Memory bound operations are an issue for numpy in general and the only way I know to deal with them is to use something like cython or numba.

https://en.xdnf.cn/q/72896.html

Related Q&A

SQLite: return only top 2 results within each group

I checked other solutions to similar problems, but sqlite does not support row_number() and rank() functions or there are no examples which involve joining multiple tables, grouping them by multiple co…

Python list.append if not in list vs set.add performance [duplicate]

This question already has answers here:Which is faster and why? Set or List?(3 answers)Closed 6 years ago.Which is more performant, and what is asymptotic complexity (or are they equivalent) in Pytho…

using the hardware rng from python

Are there any ready made libraries so that the intel hardware prng (rdrand) can be used by numpy programs to fill buffers of random numbers?Failing this can someone point me in the right direction for…

How do I revert sys.stdout.close()?

In the interactive console:>>> import sys >>> sys.stdout <open file <stdout>, mode w at 0xb7810078> >>> sys.stdout.close() >>> sys.stdout # confirming th…

Find a value from x axis that correspond to y axis in matplotlib python

I am trying to do simple task such as to read values of x axis that corresponds to value of y axis in matplotlib and I cannot see what is wrong. In this case I am interested for example to find which v…

Django accessing OneToOneField

Made a view that extended User:class Client(models.Model):user = models.OneToOneField(User, related_name=user)def __unicode__(self):return "%s" % (self.user) I am currently trying to access…

Pandas DataFrame: copy the contents of a column if it is empty

I have the following DataFrame with named columns and index:a a* b b* 1 5 NaN 9 NaN 2 NaN 3 3 NaN 3 4 NaN 1 NaN 4 NaN 9 NaN 7The data…

Solving the most profit algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 9…

Get combobox value in python

Im developing an easy program and I need to get the value from a Combobox. It is easy when the Combobox is in the first created window but for example if I have two windows and the Combobox is in the s…

PyAudio (PortAudio issue) Python

I installed pyaudio with anaconda python. Using conda install pyaudio on windows. It said it installed and it also installed PortAudio with it.However, when I create my file and run it now I get the fo…