Cross entropy loss suddenly increases to infinity

2024/11/17 1:44:03

I am attempting to replicate an deep convolution neural network from a research paper. I have implemented the architecture, but after 10 epochs, my cross entropy loss suddenly increases to infinity. This can be seen in the chart below. You can ignore what happens to the accuracy after the problem occurs.

Here is the github repository with a picture of the architecture

After doing some research I think using an AdamOptimizer or relu might be a problem.

x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])#Many Convolutions and Relus omittedfinal = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_finalcross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

EDIT If anyone is interested, the solution was that I was basically feeding in incorrect data.

Answer

Solution: Control the solution space. This might mean using smaller datasets when training, it might mean using less hidden nodes, it might mean initializing your wb differently. Your model is reaching a point where the loss is undefined, which might be due to the gradient being undefined, or the final_conv signal.

Why: Sometimes no matter what, a numerical instability is reached. Eventually adding a machine epsilon to prevent dividing by zero (cross entropy loss here) just won't help because even then the number cannot be accurately represented by the precision you are using. (Ref: https://en.wikipedia.org/wiki/Round-off_error and https://floating-point-gui.de/basic/)

Considerations:
1) When tweaking epsilons, be sure to be consistent with your data type (Use the machine epsilon of the precision you are using, in your case float32 is 1e-6 ref: https://en.wikipedia.org/wiki/Machine_epsilon and python numpy machine epsilon.

2) Just in-case others reading this are confused: The value in the constructor for Adamoptimizer is the learning rate, but you can set the epsilon value (ref: How does paramater epsilon affects AdamOptimizer? and https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer)

3) Numerical instability of tensorflow is there and its difficult to get around. Yes there is tf.nn.softmax_with_cross_entropy but this is too specific (what if you don't want a softmax?). Refer to Vahid Kazemi's 'Effective Tensorflow' for an insightful explanation: https://github.com/vahidk/EffectiveTensorflow#entropy

https://en.xdnf.cn/q/71273.html

Related Q&A

Converting each element of a list to tuple

to convert each element of list to tuple like following : l = [abc,xyz,test]convert to tuple list: newl = [(abc,),(xyz,),(test,)]Actually I have dict with keys like this so for searching purpose I need…

Python, Zeep response to pandas

I am tryng to conenct to a SOAP webservice and use pandas to put in on a table.Zeep give me this list:[{ssPeca: 103,ssQtd: 1,ssUn: un }, {ssPeca: 291A,ssQtd: 8,ssUn: un }, {ssPeca: 406B,ssQtd: 8,ssUn: …

Adjust the distance only between two subplots in matplotlib

I have 3 subplots (3 rows and 1 column). We can use fig.subplots_adjust(hspace=0.2) to adjust the distance between the subplots. this will change the distance between subplots for all case. How can I h…

Many-to-many multi-database join with Flask-SQLAlchemy

Im trying to make this many-to-many join work with Flask-SQLAlchemy and two MySQL databases, and its very close except its using the wrong database for the join table. Heres the basics... Ive got main_…

Merge two rows in the same Dataframe if their index is the same?

I have created a large Dataframe by pulling data from an Azure database. The construction of the dataframe wasnt simple as I had to do it in parts, using the concat function to add new columns to the d…

How do I use the Postgresql ANY operator in a NOT IN statement

Using Pyscopg2, how do I pass a Python list into an SQL statement using the ANY Operator?Normal Working SQL reads (See SQL Fiddle):SELECT * FROM student WHERE id NOT IN (3);Using Psycopg2 as below:Psy…

pass command-line arguments to runpy

I have two files, one which has a side effect I care about that occurs within the if __name__ == "__main__" guard:# a.py d = {} if __name__ == "__main__":d[arg] = helloThe second fi…

Altering my python path: helloworld.py returns command not found—

Massive apologies for this embarrassing question—Im using my MacBook Pro, running snow leopard, and using Python 2.7.1. Trying to run my first script and all the first pages of all my tutorials are la…

ImportError: cannot import name force_text

I have installed Python 2.7 and Django 1.4 in my CentOS machine and installed all dependencies for my existing project. When I run python manage.py runserver, I am getting the following traceback in my…

How can I select the pixels that fall within a contour in an image represented by a numpy array?

VI have a set of contour points drawn on an image which is stored as a 2D numpy array. The contours are represented by 2 numpy arrays of float values for x and y coordinates each. These coordinates are…