Question 1

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np
>>> import scipy.sparse as ssp>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],[ 0.,  0., -1.],[ 0.,  0.,  0.],[ 0.,  0.,  0.],[ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d
array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d
array([[ 0.,  0.,  0.],[ 0.,  0., -3.],[ 0.,  0.,  0.],[ 0.,  0.,  0.],[ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray())
False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

Question 2

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],[ 0.,  0., -3.],[ 0.,  0.,  0.],[ 0.,  0.,  0.],[ 0.,  6.,  0.]])

Hope that helps!

How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

Related Q&A

Do I have to do StringIO.close()?

Python: ulimit and nice for subprocess.call / subprocess.Popen?

array.shape() giving error tuple not callable

Python unittest - Ran 0 tests in 0.000s

How and where does py.test find fixtures

Python match a string with regex [duplicate]

What are the consequences of disabling gossip, mingle and heartbeat for celery workers?

How to determine if an exception was raised once youre in the finally block?

How to use re match objects in a list comprehension

How do I run pip on python for windows? [duplicate]