Question 1

I stumbled across the definition of mse in Keras and I can't seem to find an explanation.

def mean_squared_error(y_true, y_pred):return K.mean(K.square(y_pred - y_true), axis=-1)

I was expecting the mean to be taken across the batches, which is axis=0, but instead, it is axis=-1.

I also played around with it a little to see if K.mean actually behaves like the numpy.mean. I must have misunderstood something. Can somebody please clarify?

I can't actually take a look inside the cost function at run time right? As far as I know the function is called at compile time, which prevents me from evaluating concrete values.

I mean... imagine doing regression and having a single output neuron and training with a batch size of ten.

>>> import numpy as np
>>> a = np.ones((10, 1))
>>> a
array([[ 1.],[ 1.],[ 1.],[ 1.],[ 1.],[ 1.],[ 1.],[ 1.],[ 1.],[ 1.]])
>>> np.mean(a, axis=-1)
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

All it does is flatten the array instead of taking the mean of all the predictions.

Question 2

K.mean(a, axis=-1) and also np.mean(a, axis=-1) is just taking the mean across the final dimension. Here a is an array with shape (10, 1) and in this case, taking the mean across the final dimension happens to be the same as flattening it to a 1d array of shape (10,). Implementing it like so supports the more general case of e.g. multiple linear regression.

Also, you can inspect the value of nodes in the computation graph at run-time using keras.backend.print_tensor. See answer: Is there any way to debug a value inside a tensor while training on Keras?

Edit: You question appears to be about why the loss doesn't return a single scalar value but instead returns a scalar value for each data-point in the batch. To support sample weighting, Keras losses are expected to return a scalar for each data-point in the batch. See losses documentation and the sample_weight argument of fit for more information. Note specifically: "The actual optimized objective is the [weighted] mean of the output array across all data points."

Keras MSE definition

Related Q&A

How do I adjust the size and aspect ratio of matplotlib radio buttons?

App engine NDB: how to access verbose_name of a property

How do I load specific rows from a .txt file in Python?

How to check in python if Im in certain range of times of the day?

Python class()

how to handle ever-changing password in sqlalchemy+psycopg2?

How to determine which compiler was requested

Does pytest have anything like google tests non-fatal EXPECT_* behavior?

Radon transformation in python

Librosa raised OSError(sndfile library not found) in Docker