How to calculate correlation coefficients using sklearn CCA module?

2024/10/4 14:45:51

I need to measure similarity between feature vectors using CCA module. I saw sklearn has a good CCA module available: https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html

In different papers I reviewed, I saw that the way to measure similarity using CCA is to calculate the mean of the correlation coefficients, for example as done in this following notebook example: https://github.com/google/svcca/blob/1f3fbf19bd31bd9b76e728ef75842aa1d9a4cd2b/tutorials/001_Introduction.ipynb

How to calculate the correlation coefficients (as shown in the notebook) using sklearn CCA module?

from sklearn.cross_decomposition import CCA
import numpy as npU = np.random.random_sample(500).reshape(100,5)
V = np.random.random_sample(500).reshape(100,5)cca = CCA(n_components=1)
cca.fit(U, V)cca.coef_.shape                   # (5,5)U_c, V_c = cca.transform(U, V)U_c.shape                         # (100,1)
V_c.shape                         # (100,1)

This is an example of the sklearn CCA module, however I have no idea how to retrieve correlation coefficients from it.

Answer

In reference to the notebook you provided which is a supporting artefact to and implements ideas from the following two papers

  1. "SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability". Neural Information Processing Systems (NeurIPS) 2017
  2. "Insights on Representational Similarity in Deep Neural Networks with Canonical Correlation". Neural Information Processing Systems (NeurIPS) 2018

The authors there calculate 50 = min(A_fake neurons, B_fake neurons) components and plot the correlations between the transformed vectors of each component (i.e. 50).

With the help of the below code, using sklearn CCA, I am trying to reproduce their Toy Example. As we'll see the correlation plots match. The sanity check they used in the notebook came very handy - it passed seamlessly with this code as well.

import numpy as np
from matplotlib import pyplot as plt
from sklearn.cross_decomposition import CCA# rows contain the number of samples for CCA and the number of rvs goes in columns
X = np.random.randn(2000, 100)
Y = np.random.randn(2000, 50)# num of components
n_comps = min(X.shape[1], Y.shape[1])
cca = CCA(n_components=n_comps)
cca.fit(X, Y)
X_c, Y_c = cca.transform(X, Y)# calculate and plot the correlations of all components
corrs = [np.corrcoef(X_c[:, i], Y_c[:, i])[0, 1] for i in range(n_comps)]    
plt.plot(corrs)
plt.xlabel('cca_idx')
plt.ylabel('cca_corr')
plt.show()

Output:

enter image description here

For the sanity check, replace the Y data matrix by a scaled invertible transform of X and rerun the code.

Y = np.dot(X, np.random.randn(100, 100)) 

Output:

enter image description here

https://en.xdnf.cn/q/70570.html

Related Q&A

cant open shape file with GeoPandas

I dont seem to be able to open the zip3.zip shape file I download from (http://www.vdstech.com/usa-data.aspx)Here is my code:import geopandas as gpd data = gpd.read_file("data/zip3.shp")this …

How to fix the error :range object is not callable in python3.6 [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

pyspark: Save schemaRDD as json file

I am looking for a way to export data from Apache Spark to various other tools in JSON format. I presume there must be a really straightforward way to do it.Example: I have the following JSON file jfil…

How to start a background thread when django server is up?

TL;DR In my django project, where do I put my "start-a-thread" code to make a thread run as soon as the django server is up?First of all, Happy New Year Everyone! This is a question from a n…

Can I use SQLAlchemy relationships in ORM event callbacks? Always get None

I have a User model that resembles the following:class User(db.Model):id = db.Column(db.BigInteger, primary_key=True)account_id = db.Column(db.BigInteger, db.ForeignKey(account.id))account = db.relatio…

Elegant way to transform a list of dict into a dict of dicts

I have a list of dictionaries like in this example:listofdict = [{name: Foo, two: Baz, one: Bar}, {name: FooFoo, two: BazBaz, one: BarBar}]I know that name exists in each dictionary (as well as the oth…

sharpen image to detect edges/lines in a stamped X object on paper

Im using python & opencv. My goal is to detect "X" shaped pieces in an image taken with a raspberry pi camera. The project is that we have pre-printed tic-tac-toe boards, and must image t…

How to change the color of lines within a subplot?

My goal is to create a time series plot for each column in my data with their corresponding rolling mean. Id like the color of the lines across subplots to be different. For example, for gym and rollin…

Cythons calculations are incorrect

I implemented the Madhava–Leibniz series to calculate pi in Python, and then in Cython to improve the speed. The Python version:from __future__ import division pi = 0 l = 1 x = True while True:if x:pi…

Python: NLTK and TextBlob in french

Im using NLTK and TextBlob to find nouns and noun phrases in a text:from textblob import TextBlob import nltkblob = TextBlob(text) print(blob.noun_phrases) tokenized = nltk.word_tokenize(text) nouns =…