CUDNN_STATUS_NOT_INITIALIZED when trying to run TensorFlow

2024/9/16 23:11:51

I have installed TensorFlow 1.7 on an Ubuntu 16.04 with Cuda 9.0 and CuDNN 7.0.5 and vanilla Python 2.7 and although they samples for both CUDA and CuDNN run fine, and TensorFlow sees the GPU (so some TensorFlow examples run), those that use CuDNN (like most CNN examples) do not. They fail with these Informational messages:

2018-04-10 16:14:17.013026: I tensorflow/stream_executor/plugin_registry.cc:243] Selecting default DNN plugin, cuDNN
25428 2018-04-10 16:14:17.013100: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
25429 2018-04-10 16:14:17.013119: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.130  Wed Mar 21 03:37:26 PDT 2018
25430 GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)
25431 """
25432 2018-04-10 16:14:17.013131: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:112] version string "384.130" made value 384.130.0
25433 2018-04-10 16:14:17.013135: E tensorflow/stream_executor/cuda/cuda_dnn.cc:411] possibly insufficient driver version: 384.130.0
25434 2018-04-10 16:14:17.013139: E tensorflow/stream_executor/cuda/cuda_dnn.cc:370] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
25435 2018-04-10 16:14:17.013143: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

Turning on a flood of VLOG messages (see my link below for how to do this) did not produce any additional relevant messages.

The key message here might be "Selecting default DNN plugin, cuDNN", because looking at the code I might think that it can't load the cuDNN library modules, but for all I know it is actually normal (so not a warning) and the problem could be something else.

For example the "CUDNN_STATUS_NOT_INITIALIZED" message seems to have been caused in an earlier version by TF too aggressively allocating memory ahead of time (found this in the TF GitHub issues list) so CuDNN could not initialize, but I tried those remedies (including resetting the GPU and rebooting), but they did not help.

Any ideas as to what I should try next?

Answer

Ok, I found this, it was caused by me having the wrong version of cuDNN installed, so my suspicion that it was not actually finding the correct shared library was true.

Basically I installed cuDNN v7.1.2 for Cuda 9.1 instead of cuDNN v7.1.2 for Cuda 9.0, which seems to have been causing it to silently fail - although I would have expected an error message at this point. Note that I had detailed VLOGs running, (see my answer on this post for more information on how to do that Turning on TF Logs):

When I installed cuDNN v7.1.2 for Cuda 9.0 it did in fact find it and complain that that version was not new enough - when in fact the real problem was that it was not old enough, but at least I had some real data to work with.

In the end cuDNN v7.0.5 for Cuda 9.0 was what I needed and that worked.

https://en.xdnf.cn/q/72634.html

Related Q&A

What is the correct way to switch freely between asynchronous tasks?

Suppose I have some tasks running asynchronously. They may be totally independent, but I still want to set points where the tasks will pause so they can run concurrently. What is the correct way to run…

How to write integers to port using PySerial

I am trying to write data to the first serial port, COM1, using PySerial.import serial ser = serial.Serial(0) print (ser.name) ser.baudrate = 56700 ser.write("abcdefg") ser.close()ought to wo…

Pandas sort columns by name

I have the following dataframe, where I would like to sort the columns according to the name. 1 | 13_1 | 13_10| 13_2 | 2 | 3 9 | 31 | 2 | 1 | 3 | 4I am trying to sort the columns in the f…

Series objects are mutable, thus they cannot be hashed error calling to_csv

I have a large Dataframe (5 days with one value per second, several columns) of which Id like to save 2 columns in a csv file with python pandas df.to_csv module.I tried different ways but always get t…

Python client / server question

Im working on a bit of a project in python. I have a client and a server. The server listens for connections and once a connection is received it waits for input from the client. The idea is that the c…

Segmentation fault during import cv on Mac OS

Trying to compile opencv on my Mac from source. I have following CMakeCache.txt: http://pastebin.com/KqPHjBx0I make ccmake .., press c, then g. Than I make sudo make -j8: http://pastebin.com/cJyr1cEdTh…

Python bug - or my stupidity - EOL while scanning string literal

I cannot see a significant difference between the two following lines. Yet the first parses, and the latter, does not.In [5]: n=""" \\"Axis of Awesome\\" """In […

IOPub Error on Google Colaboratory in Jupyter Notebook

I understand that the below command jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10 would let me set the data rate. But on Colab, I cannot run this command since the notebook is already ope…

Python code calls C library that create OS threads, which eventually call Python callbacks

If the one and only Python interpreter is in the middle of executing a bytecode when the OS dispatches another thread, which calls a Python callback - what happens? Am I right to be concerned about th…

Django MTMField: limit_choices_to = other_ForeignKeyField_on_same_model?

Ive got a couple django models that look like this:from django.contrib.sites.models import Siteclass Photo(models.Model):title = models.CharField(max_length=100)site = models.ForeignKey(Site)file = mod…