How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

2024/11/19 15:22:12

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np
>>> import scipy.sparse as ssp>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],[ 0.,  0., -1.],[ 0.,  0.,  0.],[ 0.,  0.,  0.],[ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d
array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d
array([[ 0.,  0.,  0.],[ 0.,  0., -3.],[ 0.,  0.,  0.],[ 0.,  0.,  0.],[ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray())
False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

Answer

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],[ 0.,  0., -3.],[ 0.,  0.,  0.],[ 0.,  0.,  0.],[ 0.,  6.,  0.]])

Hope that helps!

https://en.xdnf.cn/q/26426.html

Related Q&A

Do I have to do StringIO.close()?

Some code:import cStringIOdef f():buffer = cStringIO.StringIO()buffer.write(something)return buffer.getvalue()The documentation says:StringIO.close(): Free the memory buffer. Attempting to do furtherop…

Python: ulimit and nice for subprocess.call / subprocess.Popen?

I need to limit the amount of time and cpu taken by external command line apps I spawn from a python process using subprocess.call , mainly because sometimes the spawned process gets stuck and pins the…

array.shape() giving error tuple not callable

I have a 2D numpy array called results, which contains its own array of data, and I want to go into it and use each list: for r in results:print "r:"print ry_pred = np.array(r)print y_pred.sh…

Python unittest - Ran 0 tests in 0.000s

So I want to do this code Kata for practice. I want to implement the kata with tdd in separate files:The algorithm:# stringcalculator.py def Add(string):return 1and the tests:# stringcalculator.spec.…

How and where does py.test find fixtures

Where and how does py.test look for fixtures? I have the same code in 2 files in the same folder. When I delete conftest.py, cmdopt cannot be found running test_conf.py (also in same fo…

Python match a string with regex [duplicate]

This question already has answers here:What exactly do "u" and "r" string prefixes do, and what are raw string literals?(7 answers)What exactly is a "raw string regex" an…

What are the consequences of disabling gossip, mingle and heartbeat for celery workers?

What are the implications of disabling gossip, mingle, and heartbeat on my celery workers?In order to reduce the number of messages sent to CloudAMQP to stay within the free plan, I decided to follow …

How to determine if an exception was raised once youre in the finally block?

Is it possible to tell if there was an exception once youre in the finally clause? Something like:try:funky code finally:if ???:print(the funky code raised)Im looking to make something like this m…

How to use re match objects in a list comprehension

I have a function to pick out lumps from a list of strings and return them as another list:def filterPick(lines,regex):result = []for l in lines:match = re.search(regex,l)if match:result += [match.grou…

How do I run pip on python for windows? [duplicate]

This question already has answers here:Why does "pip install" inside Python raise a SyntaxError?(6 answers)Closed 8 years ago.Ive just installed python 3.5, ran Python 3.5 (32-bit) and typed…