I want to compute the covariance C
of n
measurements of p
quantities, where each individual quantity measurement is given its own weight. That is, my weight array W
has the same shape as my quantity array Q
(n
by p
). The native np.cov()
function only supports weights given to individual measurements (i.e., a vector of length n
).
I can initialize a p
by p
matrix and iterate, but if p
is large, then it's a very slow process.
Since Q
is known to have mean zero for each quantity (column of Q
), the explicit formula for each element of C
is
C[i,j] = np.sum(Q[:, i] * Q[:, j] * W[:, i] * W[:, j]) / np.sum(W[:, i] * W[:, j])
If I rearrange the numerator to be Q[:, i] * W[:, i] * Q[:, j] * W[:, j]
, it seems like I should be able to multiply and sum columns of Q * W
, and then do the denominator similarly (except using W * W
).
Is there a way to do this with np.einsum()
?
For testing, let's define the following:
C = array([[ 1. , 0.1 , 0.2 ], # set this beforehand, to test whether [ 0.1 , 0.5 , 0.15], # we get the correct result[ 0.2 , 0.15, 0.75]])Q = array([[-0.6084634 , 0.16656143, -1.04490324],[-1.51164337, -0.96403094, -2.37051952],[-0.32781346, -0.19616374, -1.32591578],[-0.88371729, 0.20877833, -0.52074272],[-0.67987913, -0.84458226, 0.02897935],[-2.01924756, -0.51877396, -0.68483981],[ 1.64600477, 0.67620595, 1.24559591],[ 0.82554885, 0.14884613, -0.15211434],[-0.88119527, 0.11663335, -0.31522598],[-0.14830668, 1.26906561, -0.49686309]])W = array([[ 1.01133857, 0.91962164, 1.01897898],[ 1.09467975, 0.91191381, 0.90150961],[ 0.96334661, 1.00759046, 1.01638749],[ 1.04827001, 0.95861001, 1.01248969],[ 0.91572506, 1.09388218, 1.03616461],[ 0.9418178 , 1.07210878, 0.90431879],[ 1.0093642 , 1.00408472, 1.07570172],[ 0.92203074, 1.00022631, 1.09705542],[ 0.99775598, 0.01000000, 0.94996408],[ 1.02996389, 1.01224303, 1.00331465]])