I'm trying to follow some example code provided in the documentation for np.linalg.svd
in order to compare term and document similarities following an SVD on a TDM matrix. Here's what I've got:
results_t = np.linalg.svd(tdm_t)
results_t[1].shape
yields
(1285,)
Also
results_t[2].shape
(5334, 5334)
So then trying to broadcast these results to create a real S
matrix per the classic SVD projection approach, I've got:
S = np.zeros((results_t[0].shape[0], results_t[2].shape[0]), dtype = float)
S[:results_t[2].shape[0], :results_t[2].shape[0]] = results_t[1]
This last line produces the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-329-16e79bc97c4b> in <module>()
----> 1 S[:results_t[2].shape[0], :results_t[2].shape[0]] = results_t[1]ValueError: could not broadcast input array from shape (1285) into shape (1285,5334)
What am I doing wrong here?
So according to the error message, the target
S[:results_t[2].shape[0], :results_t[2].shape[0]]
is (1285,5334)
, while the source
results_t[1]
is (1285,)
.
So it has to broadcast the source to a shape that matches the target before it can do the assignment. Same would apply if trying to sum two arrays with these shapes, or multiply, etc.
The first broadcasting step to make the number of dimensions match. The source is 1d, so it needs to be 2d. numpy
will do try results_t[1][np.newaxis,:]
, producing (1, 1285)
.
Second step is to expand all size 1 dimensions to match the other. But that can't happen here - hence the error. (1285,5334)+(1,1285)?
======
If you want to assign to a block (or all of S
) then use:
S[:results_t[2].shape[0], :results_t[2].shape[0]] = results_t[1][:,np.newaxis]
To assign r1
to a diagonal of S
, use list indexing (instead of slices):
S[range(results_t[1].shape[0]), range(results_t[1].shape[0])] = results_t[1]
or
S[np.diag_indices(results_t[1].shape[0])] = results_t[1]
In this case those ranges
have to match results_t[1]
in length.