Question 1

I'm training a U-Net CNN in Keras/Tensorflow and find that loss massively decreases between the last batch of the first epoch, and the first batch of the second epoch:

Epoch 00001: loss improved from inf to 0.07185 - categorical_accuracy: 0.8636
Epoch 2/400: 1/250 [.....................] - loss: 0.0040 - categorical_accuracy: 0.8878

Weirdly categorical accuracy does not drop with loss, but increases slightly. After the drop in loss, it doesn't decrease further, but settles around the lower value. I know this is very little information on the problem, but this behaviour might indicate a common problem I can investigate more?

Some extra info: Optimizer = Adam(lr=1e-4)(Lowering lr didn't seem to help)

Loss: 'class weighted categorical cross entropy', calculated as follows

def class_weighted_categorical_crossentropy(class_weights):def loss_function(y_true, y_pred):# scale preds so that the class probas of each sample sum to 1y_pred /= tf.reduce_sum(y_pred, -1, True)# manual computation of crossentropyepsilon = tf.convert_to_tensor(K.epsilon(), y_pred.dtype.base_dtype)y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)# Multiply each class by its weight:classes_list = tf.unstack(y_true * tf.math.log(y_pred), axis=-1)for i in range(len(classes_list)):classes_list[i] = tf.scalar_mul(class_weights[i], classes_list[i])# Return weighted sum:return - tf.reduce_sum(tf.stack(classes_list, axis=-1), -1)return loss_function

Any ideas/sanity checks are much appreciated!

EDIT:This is the loss plot for training, I didn't have time to neaten it up, its loss plotted per step, not epoch, and you can see the shift to epoch 2 after 250 steps, up until that point the loss curve seems very good, but the shift two epoch two seems strange.

Question 2

That sounds right to me. Remember, there is an inverse relationship between loss and accuracy, so as loss decreases, accuracy increases.

My understanding is that, during the first epoch, you basically have a neural network with more-or-less random initial state. After the first epoch, the weights of the neural network will be adjusted often by minimize the loss function (which as previously states is effectively the same as maximizing accuracy). So, at the beginning of the second epoch, your loss should be a lot better (i.e. lower). That means that your neural network is learning.

Why does Keras loss drop dramatically after the first epoch?

Related Q&A

extract strings from a binary file in python

Installing numpy on Mac to work on AWS Lambda

python- how to get the output of the function used in Timer

Create automated tests for interactive shell based on Pythons cmd module

Matplotlib with multiprocessing freeze computer

Pull Tag Value using BeautifulSoup

What is the practical difference between xml, json, rss and atom when interfacing with Twitter?

how to grab from JSON in selenium python

Numpy: Array of `arange`s

Understanding model.summary Keras