How to compute Spearman correlation in Tensorflow

2024/9/28 1:16:27

Problem

I need to compute the Pearson and Spearman correlations, and use it as metrics in tensorflow.

For Pearson, it's trivial :

tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)

But for Spearman, I am clueless !

What I tried :

From this answer :

    samples = 1predictions_rank = tf.nn.top_k(y_pred, k=samples, sorted=True, name='prediction_rank').indicesreal_rank = tf.nn.top_k(y_true, k=samples, sorted=True, name='real_rank').indicesrank_diffs = predictions_rank - real_rankrank_diffs_squared_sum = tf.reduce_sum(rank_diffs * rank_diffs)six = tf.constant(6)one = tf.constant(1.0)numerator = tf.cast(six * rank_diffs_squared_sum, dtype=tf.float32)divider = tf.cast(samples * samples * samples - samples, dtype=tf.float32)spearman_batch = one - numerator / divider

But this return NaN...


Following the definition of Wikipedia :enter image description here

I tried :

size = tf.size(y_pred)
indice_of_ranks_pred = tf.nn.top_k(y_pred, k=size)[1]
indice_of_ranks_label = tf.nn.top_k(y_true, k=size)[1]
rank_pred = tf.nn.top_k(-indice_of_ranks_pred, k=size)[1]
rank_label = tf.nn.top_k(-indice_of_ranks_label, k=size)[1]
rank_pred = tf.to_float(rank_pred)
rank_label = tf.to_float(rank_label)
spearman = tf.contrib.metrics.streaming_pearson_correlation(rank_pred, rank_label)

But running this I got the following error :

tensorflow.python.framework.errors_impl.InvalidArgumentError: inputmust have at least k columns. Had 1, needed 32

[[{{node metrics/spearman/TopKV2}} = TopKV2[T=DT_FLOAT, sorted=true,_device="/job:localhost/replica:0/task:0/device:CPU:0"](lambda_1/add, metrics/pearson/pearson_r/variance_predictions/Size)]]

Answer

One thing you can do is use Tensorflow's function tf.py_function to use with the scipy.stats.spearmanr and define the input and output like that:

from scipy.stats import spearmanr
def get_spearman_rankcor(y_true, y_pred):return ( tf.py_function(spearmanr, [tf.cast(y_pred, tf.float32), tf.cast(y_true, tf.float32)], Tout = tf.float32) )
https://en.xdnf.cn/q/71397.html

Related Q&A

Pytorch loss is nan

Im trying to write my first neural network with pytorch. Unfortunately, I encounter a problem when I want to get the loss. The following error message: RuntimeError: Function LogSoftmaxBackward0 return…

How do you debug python code with kubernetes and skaffold?

I am currently running a django app under python3 through kubernetes by going through skaffold dev. I have hot reload working with the Python source code. Is it currently possible to do interactive deb…

Discrepancies between R optim vs Scipy optimize: Nelder-Mead

I wrote a script that I believe should produce the same results in Python and R, but they are producing very different answers. Each attempts to fit a model to simulated data by minimizing deviance usi…

C++ class not recognized by Python 3 as a module via Boost.Python Embedding

The following example from Boost.Python v1.56 shows how to embed the Python 3.4.2 interpreter into your own application. Unfortunately that example does not work out of the box on my configuration with…

Python NET call C# method which has a return value and an out parameter

Im having the following static C# methodpublic static bool TryParse (string s, out double result)which I would like to call from Python using the Python NET package.import clr from System import Double…

ValueError: Length of passed values is 7, index implies 0

I am trying to get 1minute open, high, low, close, volume values from bitmex using ccxt. everything seems to be fine however im not sure how to fix this error. I know that the index is 7 because there …

What is pythons strategy to manage allocation/freeing of large variables?

As a follow-up to this question, it appears that there are different allocation/deallocation strategies for little and big variables in (C)Python. More precisely, there seems to be a boundary in the ob…

Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

Running locally on a Jupyter notebook and using the MNIST dataset (28k entries, 28x28 pixels per image, the following takes 27 seconds. from sklearn.neighbors import KNeighborsClassifierknn_clf = KNeig…

Do I need to do any text cleaning for Spacy NER?

I am new to NER and Spacy. Trying to figure out what, if any, text cleaning needs to be done. Seems like some examples Ive found trim the leading and trailing whitespace and then muck with the start/st…

Hi , I have error related to object detection project

I have error related to simple object detection .output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] IndexError: invalid index to scalar variable.import cv2.cv2 as cv import…