I'm training a U-Net CNN in Keras/Tensorflow and find that loss massively decreases between the last batch of the first epoch, and the first batch of the second epoch:
Epoch 00001: loss improved from inf to 0.07185 - categorical_accuracy: 0.8636
Epoch 2/400: 1/250 [.....................] - loss: 0.0040 - categorical_accuracy: 0.8878
Weirdly categorical accuracy does not drop with loss, but increases slightly. After the drop in loss, it doesn't decrease further, but settles around the lower value. I know this is very little information on the problem, but this behaviour might indicate a common problem I can investigate more?
Some extra info: Optimizer = Adam(lr=1e-4)(Lowering lr didn't seem to help)
Loss: 'class weighted categorical cross entropy', calculated as follows
def class_weighted_categorical_crossentropy(class_weights):def loss_function(y_true, y_pred):# scale preds so that the class probas of each sample sum to 1y_pred /= tf.reduce_sum(y_pred, -1, True)# manual computation of crossentropyepsilon = tf.convert_to_tensor(K.epsilon(), y_pred.dtype.base_dtype)y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)# Multiply each class by its weight:classes_list = tf.unstack(y_true * tf.math.log(y_pred), axis=-1)for i in range(len(classes_list)):classes_list[i] = tf.scalar_mul(class_weights[i], classes_list[i])# Return weighted sum:return - tf.reduce_sum(tf.stack(classes_list, axis=-1), -1)return loss_function
Any ideas/sanity checks are much appreciated!
EDIT:This is the loss plot for training, I didn't have time to neaten it up, its loss plotted per step, not epoch, and you can see the shift to epoch 2 after 250 steps, up until that point the loss curve seems very good, but the shift two epoch two seems strange.