Most efficient way to multiply a small matrix with a scalar in numpy

2024/9/21 8:03:25

I have a program whose main performance bottleneck involves multiplying matrices which have one dimension of size 1 and another large dimension, e.g. 1000:

large_dimension = 1000a = np.random.random((1,))
b = np.random.random((1, large_dimension))c = np.matmul(a, b)

In other words, multiplying matrix b with the scalar a[0].

I am looking for the most efficient way to compute this, since this operation is repeated millions of times.

I tested for performance of the two trivial ways to do this, and they are practically equivalent:

%timeit np.matmul(a, b)
>> 1.55 µs ± 45.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)%timeit a[0] * b
>> 1.77 µs ± 34.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Is there a more efficient way to compute this?

  • Note: I cannot move these computations to a GPU since the program is using multiprocessing and many such computations are done in parallel.
Answer
large_dimension = 1000a = np.random.random((1,))
B = np.random.random((1, large_dimension))%timeit np.matmul(a, B)
5.43 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)%timeit a[0] * B
5.11 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Use just float

%timeit float(a[0]) * B
3.48 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

To avoid memory allocation use "buffer"

buffer = np.empty_like(B)%timeit np.multiply(float(a[0]), B, buffer)
2.96 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

To avoid unnecessary getting attribute use "alias"

mul = np.multiply%timeit mul(float(a[0]), B, buffer)
2.73 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And I don't recommend using numpy scalars at all, because if you avoid it, computation will be faster

a_float = float(a[0])%timeit mul(a_float, B, buffer)
1.94 µs ± 5.74 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Furthermore, if it's possible then initialize buffer out of loop once (of course, if you have something like loop :)

rng = range(1000)%%timeit
for i in rng:pass
24.4 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)%%timeit
for i in rng:mul(a_float, B, buffer)
1.91 ms ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So,

"best_iteration_time" = (1.91 - 0.02) / 1000 => 1.89 (µs)

"speedup" = 5.43 / 1.89 = 2.87

https://en.xdnf.cn/q/72081.html

Related Q&A

MultiValueDictKeyError / request.POST

I think I hav a problem at request.POST[title]MultiValueDictKeyError at /blog/add/post/"title"Request Method: GETRequest URL: http://119.81.247.69:8000/blog/add/post/Django Version: 1.8.…

How can I auto run py.test once a relative command has been change?

Via autonose or nosy, it will automatically run the nosetests once the some tests file or the relative files have been changes. I would like to ask that whether py.test provides the similar function fo…

Publish a post using XML-RPC WordPress API and Python with category

Im doing a migration from a website to another one which use Wordpress. I created new custom types for my needs (with the plugin Custom Post Types), and I created categories for each custom type.I then…

Django registration email not sending

Ive been trying to get the django-registration-redux account activation email to send to newly registered users.Ive gotten all non-email related parts to work, such as loggin in/out and actually regist…

NumPy data type comparison

I was playing with comparing data types of two different arrays to pick one that is suitable for combining the two. I was happy to discover that I could perform comparison operations, but in the proces…

A simple method for rotating images in reportlab

How can we easily rotate an image using reportlab? I have not found an easy method. The only way found comes from http://dods.ipsl.jussieu.fr/orchidee/SANORCHIDEE/TEMP/TEMP_LOCAL/cdat_portable/lib_new…

XML header getting removed after processing with elementtree

i have an xml file and i used Elementtree to add a new tag to the xml file.My xml file before processing is as follows <?xml version="1.0" encoding="utf-8"?><PackageInfo …

How to ntp server time down to millisecond precision using Python ntplib?

I am creating a python module that will output the time from a selection of NTP Pool servers to millisecond precision as an exercise in showing how server timestamps vary. Thus far I have been able to …

Control 2 separate Excel instances by COM independently... can it be done?

Ive got a legacy application which is implemented in a number of Excel workbooks. Its not something that I have the authority to re-implement, however another application that I do maintain does need t…

nested Python numpy arrays dimension confusion

Suppose I have a numpy array c constructed as follows:a = np.zeros((2,4)) b = np.zeros((2,8)) c = np.array([a,b])I would have expected c.shape to be (2,1) or (2,) but instead it is (2,2). Additionally,…