I'm trying to figure out how to use PCA to decorrelate an RGB image in python. I'm using the code found in the O'Reilly Computer vision book:
from PIL import Image
from numpy import *def pca(X):# Principal Component Analysis# input: X, matrix with training data as flattened arrays in rows# return: projection matrix (with important dimensions first),# variance and mean#get dimensionsnum_data,dim = X.shape#center datamean_X = X.mean(axis=0)for i in range(num_data):X[i] -= mean_Xif dim>100:print 'PCA - compact trick used'M = dot(X,X.T) #covariance matrixe,EV = linalg.eigh(M) #eigenvalues and eigenvectorstmp = dot(X.T,EV).T #this is the compact trickV = tmp[::-1] #reverse since last eigenvectors are the ones we wantS = sqrt(e)[::-1] #reverse since eigenvalues are in increasing orderelse:print 'PCA - SVD used'U,S,V = linalg.svd(X)V = V[:num_data] #only makes sense to return the first num_data#return the projection matrix, the variance and the meanreturn V,S,mean_X
I know I need to flatten my image, but the shape is 512x512x3. Will the dimension of 3 throw off my result? How do I truncate this? How do I find a quantitative number of how much information is retained?