Mean of a correlation matrix - pandas data fram

2024/10/3 0:34:02

I have a large correlation matrix in a pandas python DataFrame: df (342, 342).

How do I take the mean, sd, etc. of all of the numbers in the upper triangle not including the 1's along the diagonal?

Thank you.

Answer

Another potential one line answer:

In [1]: corr
Out[1]:a         b         c         d         e
a  1.000000  0.022246  0.018614  0.022592  0.008520
b  0.022246  1.000000  0.033029  0.049714 -0.008243
c  0.018614  0.033029  1.000000 -0.016244  0.049010
d  0.022592  0.049714 -0.016244  1.000000 -0.015428
e  0.008520 -0.008243  0.049010 -0.015428  1.000000In [2]: corr.values[np.triu_indices_from(corr.values,1)].mean()
Out[2]: 0.016381

Edit: added performance metrics

Performance of my solution:

In [3]: %timeit corr.values[np.triu_indices_from(corr.values,1)].mean()
10000 loops, best of 3: 48.1 us per loop

Performance of Theodros Zelleke's one-line solution:

In [4]: %timeit corr.unstack().ix[zip(*np.triu_indices_from(corr, 1))].mean()
1000 loops, best of 3: 823 us per loop

Performance of DSM's solution:

In [5]: def method1(df):...:     df2 = df.copy()...:     df2.values[np.tril_indices_from(df2)] = np.nan...:     return df2.unstack().mean()...:In [5]: %timeit method1(corr)
1000 loops, best of 3: 242 us per loop
https://en.xdnf.cn/q/70787.html

Related Q&A

How to set imshow scale

Im fed up with matplotlib in that its so hard to plot images in specified size.Ive two images in 32*32, 20*20 sizes. I just want to plot them in its original size, or in proportion to its original size…

Python distutils gcc path

Im trying to cross-compile the pycrypto package, and Im getting closer and closer however, Ive hit an issue I just cant figure out.I want distutils to use the cross-compile specific gcc- so I set the C…

TypeError: builtin_function_or_method object has no attribute __getitem__

Ive got simple python functions.def readMainTemplate(templateFile):template = open(templateFile, r)data = template.read()index1 = data.index[[] #originally I passed it into data[]index2 = data.index[]]…

Extract currency amount from string in Python

Im making a program that takes currency from a string and converts it in to other currencies. For example, if the string was the car cost me $13,250 I would need to get $ and 13250. I have this regex a…

Error: The elasticsearch backend requires the installation of requests. How do I fix it?

Im having a issue when I ran "python manage.py rebuild_index" in my app supported by haystack and elasticsearch.Python 2.7 Django version 1.6.2 Haystack 2.1.0 Elasticsearch 1.0Please see the …

numpy: applying argsort to an array

The argsort() function returns a matrix of indices that can be used to index the original array so that the result would match the sort() result.Is there a way to apply those indices? I have two array…

Jinja2 for word templating

I would like to use jinja2 for word templating like mentioned is this short article. The problem Im facing is as follows, if I put {{title}} in my word-file the resulting xml can look like this:<w:r…

API capture all paginated data? (python)

Im using the requests package to hit an API (greenhouse.io). The API is paginated so I need to loop through the pages to get all the data I want. Using something like:results = [] for i in range(1,326+…

How to convert latitude longitude to decimal in python?

Assuming I have the following:latitude = "20-55-70.010N" longitude = "32-11-50.000W"What is the easiest way to convert to decimal form? Is there some library?Would converting from…

No module named main, wkhtmltopdf issue

Im new in python, but all search results i found was useless for me.C:\Users\Aero>pip install wkhtmltopdf Collecting wkhtmltopdfUsing cached wkhtmltopdf-0.2.tar.gz Installing collected packages: wkh…