Is this the correct way of whitening an image in python?

2024/10/3 12:30:42

I am trying to zero-center and whiten CIFAR10 dataset, but the result I get looks like random noise!
Cifar10 dataset contains 60,000 color images of size 32x32. The training set contains 50,000 and test set contains 10,000 images respectively.
The following snippets of code show the process I did to get the dataset whitened :

# zero-center
mean = np.mean(data_train, axis = (0,2,3)) 
for i in range(data_train.shape[0]):for j in range(data_train.shape[1]):data_train[i,j,:,:] -= mean[j]first_dim = data_train.shape[0] #50,000
second_dim = data_train.shape[1] * data_train.shape[2] * data_train.shape[3] # 3*32*32
shape = (first_dim, second_dim) # (50000, 3072) # compute the covariance matrix
cov = np.dot(data_train.reshape(shape).T, data_train.reshape(shape)) / data_train.shape[0] 
# compute the SVD factorization of the data covariance matrix
U,S,V = np.linalg.svd(cov)print 'cov.shape = ',cov.shape
print U.shape, S.shape, V.shapeXrot = np.dot(data_train.reshape(shape), U) # decorrelate the data
Xwhite = Xrot / np.sqrt(S + 1e-5)print Xwhite.shape
data_whitened = Xwhite.reshape(-1,32,32,3)
print data_whitened.shape

outputs:

cov.shape =  (3072L, 3072L)
(3072L, 3072L) (3072L,) (3072L, 3072L)
(50000L, 3072L)
(50000L, 32L, 32L, 3L)
(32L, 32L, 3L)

and trying to show the resulting image :

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.misc import imshow
print data_whitened[0].shape
fig = plt.figure()
plt.subplot(221)
plt.imshow(data_whitened[0])
plt.subplot(222)
plt.imshow(data_whitened[100])
plt.show()

enter image description here

By the way the data_train[0].shape is (3,32,32), but if I reshape the whittened image according to that I get

TypeError: Invalid dimensions for image data

Could this be a visualization issue only? if so how can I make sure thats the case?

Update :
Thanks to @AndrasDeak, I fixed the visualization code this way, but still the output looks random :

data_whitened = Xwhite.reshape(-1,3,32,32).transpose(0,2,3,1)
print data_whitened.shape
fig = plt.figure()
plt.subplot(221)
plt.imshow(data_whitened[0])

enter image description here

Update 2:
This is what I get when I run some of the commands given below : As it can be seen below, toimage can show the image just fine, but trying to reshape it, messes up the image. enter image description here

# output is of shape (N, 3, 32, 32)
X = X.reshape((-1,3,32,32))
# output is of shape (N, 32, 32, 3)
X = X.transpose(0,2,3,1)
# put data back into a design matrix (N, 3072)
X = X.reshape(-1, 3072)plt.imshow(X[6].reshape(32,32,3))
plt.show()

enter image description here

for some wierd reason, this was what I got at first , but then after several tries, it changed to the previous image. enter image description here

Answer

Let's walk through this. As you point out, CIFAR contains images which are stored in a matrix; each image is a row, and each row has 3072 columns of uint8 numbers (0-255). Images are 32x32 pixels and pixels are RGB (three channel colour).

# https://www.cs.toronto.edu/~kriz/cifar.html
# wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
# tar xf cifar-10-python.tar.gz
import numpy as np
import cPickle
with open('cifar-10-batches-py/data_batch_1') as input_file: X = cPickle.load(input_file)
X = X['data']   # shape is (N, 3072)

It turns out that the columns are ordered a bit funny: all the red pixel values come first, then all the green pixels, then all the blue pixels. This makes it tricky to have a look at the images. This:

import matplotlib.pyplot as plt
plt.imshow(X[6].reshape(32,32,3))
plt.show()

gives this:

Mixed up colour channels

So, just for ease of viewing, let's shuffle the dimensions of our matrix around with reshape and transpose:

# output is of shape (N, 3, 32, 32)
X = X.reshape((-1,3,32,32))
# output is of shape (N, 32, 32, 3)
X = X.transpose(0,2,3,1)
# put data back into a design matrix (N, 3072)
X = X.reshape(-1, 3072)

Now:

plt.imshow(X[6].reshape(32,32,3))
plt.show()

gives:

A peacock

OK, on to ZCA whitening. We're frequently reminded that it's super important to zero-center the data before whitening it. At this point, an observation about the code you include. From what I can tell, computer vision views color channels as just another feature dimension; there's nothing special about the separate RGB values in an image, just like there's nothing special about the separate pixel values. They're all just numeric features. So, whereas you're computing the average pixel value, respecting colour channels (i.e., your mean is a tuple of r,g,b values), we'll just compute the average image value. Note that X is a big matrix with N rows and 3072 columns. We'll treat every column as being "the same kind of thing" as every other column.

# zero-centre the data (this calculates the mean separately across
# pixels and colour channels)
X = X - X.mean(axis=0)

At this point, let's also do Global Contrast Normalization, which is quite often applied to image data. I'll use the L2 norm, which makes every image have vector magnitude 1:

X = X / np.sqrt((X ** 2).sum(axis=1))[:,None]

One could easily use something else, like the standard deviation (X = X / np.std(X, axis=0)) or min-max scaling to some interval like [-1,1].

Nearly there. At this point, we haven't greatly modified our data, since we've just shifted and scaled it (a linear transform). To display it, we need to get image data back into the range [0,1], so let's use a helper function:

def show(i):i = i.reshape((32,32,3))m,M = i.min(), i.max()plt.imshow((i - m) / (M - m))plt.show()show(X[6])

The peacock looks slightly brighter here, but that's just because we've stretched its pixel values to fill the interval [0,1]:

Slightly brighter peacock

ZCA whitening:

# compute the covariance of the image data
cov = np.cov(X, rowvar=True)   # cov is (N, N)
# singular value decomposition
U,S,V = np.linalg.svd(cov)     # U is (N, N), S is (N,)
# build the ZCA matrix
epsilon = 1e-5
zca_matrix = np.dot(U, np.dot(np.diag(1.0/np.sqrt(S + epsilon)), U.T))
# transform the image data       zca_matrix is (N,N)
zca = np.dot(zca_matrix, X)    # zca is (N, 3072)

Taking a look (show(zca[6])):

"Whitened" peacock

Now the peacock definitely looks different. You can see that the ZCA has rotated the image through colour space, so it looks like a picture on an old TV with the Tone setting out of whack. Still recognisable, though.

Presumably because of the epsilon value I used, the covariance of my transformed data isn't exactly identity, but it's fairly close:

>>> (np.cov(zca, rowvar=True).argmax(axis=1) == np.arange(zca.shape[0])).all()
True

Update 29 January

I'm not entirely sure how to sort out the issues you're having; your trouble seems to lie in the shape of your raw data at the moment, so I would advise you to sort that out first before you try to move on to zero-centring and ZCA.

One the one hand, the first plot of the four plots in your update looks good, suggesting that you've loaded up the CIFAR data in the correct way. The second plot is produced by toimage, I think, which will automagically figure out which dimension has the colour data, which is a nice trick. On the other hand, the stuff that comes after that looks weird, so it seems something is going wrong somewhere. I confess I can't quite follow the state of your script, because I suspect you're working interactively (notebook), retrying things when they don't work (more on this in a second), and that you're using code that you haven't shown in your question. In particular, I'm not sure how you're loading the CIFAR data; your screenshot shows output from some print statements (Reading training data..., etc.), and then when you copy train_data into X and print the shape of X, the shape has already been reshaped into (N, 3, 32, 32). Like I say, Update plot 1 would tend to suggest that the reshape has happened correctly. From plots 3 and 4, I think you're getting mixed up about matrix dimensions somewhere, so I'm not sure how you're doing the reshape and transpose.

Note that it's important to be careful with the reshape and transpose, for the following reason. The X = X.reshape(...) and X = X.transpose(...) code is modifying the matrix in place. If you do this multiple times (like by accident in the jupyter notebook), you will shuffle the axes of your matrix over and over, and plotting the data will start to look really weird. This image shows the progression, as we iterate the reshape and transpose operations:

Increasing iterations of reshape and transpose

This progression does not cycle back, or at least, it doesn't cycle quickly. Because of periodic regularities in the data (like the 32-pixel row structure of the images), you tend to get banding in these improperly reshape-transposed images. I'm wondering if that's what's going on in the third of your four plots in your update, which looks a lot less random than the images in the original version of your question.

The fourth plot of your update is a colour negative of the peacock. I'm not sure how you're getting that, but I can reproduce your output with:

plt.imshow(255 - X[6].reshape(32,32,3))
plt.show()

which gives:

Colour negative of the peacock

One way you could get this is if you were using my show helper function, and you mixed up m and M, like this:

def show(i):i = i.reshape((32,32,3))m,M = i.min(), i.max()plt.imshow((i - M) / (m - M))  # this will produce a negative imgplt.show()
https://en.xdnf.cn/q/70732.html

Related Q&A

Python zlib output, how to recover out of mysql utf-8 table?

In python, I compressed a string using zlib, and then inserted it into a mysql column that is of type blob, using the utf-8 encoding. The string comes back as utf-8, but its not clear how to get it bac…

Incorrect user for supervisord celeryd

I have some periodic tasks that I run with celery (daemonized by supervisord), but after trying to create a directory in the home dir for the user i setup for the supervisord process I got a "perm…

Pandas drop rows where column contains *

Im trying to drop all rows from this df where column DB Serial contains the character *:DB Serial 0 13058 1 13069 2 *13070 3 13070 4 13044 5 13042I am using:df = df[~df[DB Serial…

How to stop scrapy spider after certain number of requests?

I am developing an simple scraper to get 9 gag posts and its images but due to some technical difficulties iam unable to stop the scraper and it keeps on scraping which i dont want.I want to increase t…

What is the difference between single and double bracket Numpy array?

import numpy as np a=np.random.randn(1, 2) b=np.zeros((1,2)) print("Data type of A: ",type(a)) print("Data type of A: ",type(b))Output:Data type of A: <class numpy.ndarray> D…

How to make tkinter button widget take up full width of grid

Ive tried this but it didnt help. Im making a calculator program. Ive made this so far: from tkinter import * window = Tk()disp = Entry(window, state=readonly, readonlybackground="white") dis…

Python strip() unicode string?

How can you use string methods like strip() on a unicode string? and cant you access characters of a unicode string like with oridnary strings? (ex: mystring[0:4] )

Python equivalent for MATLABs normplot?

Is there a python equivalent function similar to normplot from MATLAB? Perhaps in matplotlib?MATLAB syntax:x = normrnd(10,1,25,1); normplot(x)Gives:I have tried using matplotlib & numpy module to…

python mask netcdf data using shapefile

I am using the following packages:import pandas as pd import numpy as np import xarray as xr import geopandas as gpdI have the following objects storing data:print(precip_da)Out[]:<xarray.DataArray …

Whats a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?

Im using SQLAlchemys declarative extension. Id like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table…