Keras sees my GPU but doesnt use it when training a neural network

2024/10/11 4:26:41

My GPU is not used by Keras/TensorFlow.

To try to make my GPU working with tensorflow, I installed tensorflow-gpu via pip (I am using Anaconda on Windows)

I have nvidia 1080ti

print(tf.test.is_gpu_available())True
print(tf.config.experimental.list_physical_devices())[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

I tied

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

but it didnt help

sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
print(sess)Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1<tensorflow.python.client.session.Session object at 0x000001A2A3BBACF8>

only warning from tf:

W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows 

entire log:

2019-10-18 20:06:26.094049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-10-18 20:06:35.078225: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-18 20:06:35.090832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-10-18 20:06:35.180744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:06:35.185505: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:06:35.189328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-18 20:06:35.898592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-18 20:06:35.901683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-18 20:06:35.904235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-18 20:06:35.906687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-18 20:06:38.694481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:06:38.700482: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:06:38.704020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
[I 20:06:47.324 NotebookApp] Saving file at /Untitled.ipynb
2019-10-18 20:07:22.227110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:07:22.246012: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:07:22.261643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-18 20:07:22.272150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-18 20:07:22.275457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-18 20:07:22.277980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-18 20:07:22.316260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2019-10-18 20:07:32.986802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:07:32.990509: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:07:32.993763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-18 20:07:32.995570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-18 20:07:32.997920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-18 20:07:32.999435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-18 20:07:33.001380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-18 20:07:36.048204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-10-18 20:07:37.971703: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-10-18 20:07:38.576861: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll

also tried reinstalling tensorflow-gpu with pip

Why I think GPU doesnt work? - Because my python kernel uses CPU 99%, RAM 99% and sometimes GPU ~7% but most of time its 0
I use custom data generator but now its only selects batches and resizes them(skimage.io.resize) 1 epoch ~ 44s Also has strange behavior of freezing in random points every ~10 samples and freezes hardly on last sample(37/38)(~10-15 sec)

Edit:

I post my custom datagen here

train_gen = DataGenerator(x = x_train,y = y_train,batch_size = 128,target_shape = (100, 100, 3), sample_std = False,feature_std = False,proj_parameters = None,blur_parameters = None,nois_parameters = None,flip_parameters = None,gamm_parameters = None)

validation is same

Update:

So its a generator that couses the problem, but how i can fix it?
I used only skimage and numpy operations

Answer

The logs are showing that the GPU does get used. You are almost certainly running into an IO bottleneck: your GPU is processing whatever the CPU is throwing at it way faster than the CPU can load and preprocess it. This is very common in deep learning, and there are ways to address it.

We cannot provide a lot of help without knowing more about your data pipeline (byte size of a batch, preprocessing steps, ...), and how the data is stored. One typical way to speed things up is to store the data is a binary format, like TFRecords, so that the CPU can load it faster. See the official documentation for this.


Edit: I quickly went through your input pipeline. The issue is very likely to indeed by IO:

  • You should run the preprocessing steps on the GPU as well, plenty of the augmentation techniques you use are implemented in tf.image. If you can, you should think about using Tensorflow 2.0, because it includes Keras and there are plenty of helpers in there as well.
  • Checkout the tf.data.Dataset API, it has plenty of helpers to load all the data in different threads, which can roughly speed up the process by the number of cores you have.
  • You should store your images as TFRecords. This is likely to speed up the loading by an order of magnitude if your input images are smallish.
  • You could probably try larger batch sizes as well, I'm thinking your images are probably really small.
https://en.xdnf.cn/q/69817.html

Related Q&A

Identify if there are two of the same character adjacent to eachother

Ive been asked to create a program that identifies if a password is valid or not. The one part I am struggling with is identifying whether there are two of the same character adjacent to each other. He…

Extract lined table from scanned document opencv python

I want to extract the information from a scanned table and store it a csv. Right now my table extraction algorithm does the following steps.Apply skew correction Apply a gaussian filter for denoising. …

Nested Python C Extensions/Modules?

How do I compile a C-Python module such that it is local to another? E.g. if I have a module named "bar" and another module named "mymodule", how do I compile "bar" so th…

ImportError: No module named sysconfig--cant get pip working

Im really struggling with pip on a RedHat 6.9 system. Every time I tried to use pip, I got ImportError: No module named sysconfigI tried Googling for solutions. I dont have apt-get and cant seem to get…

Convert Dataframe to a Dictionary with List Values

Suppose I have a Dataframe df :Label1 Label2 Label3 key1 col1value1 col2value1 key2 col1value2 col2value2 key3 col1value3 col2value3dict1 = df.set_index(Label1).to_dic…

Efficiently count all the combinations of numbers having a sum close to 0

I have following pandas dataframe df column1 column2 list_numbers sublist_column x y [10,-6,1,-4] a b [1,3,7,-2] p q [6,2,-3,-3.…

What is the equivalent to iloc for dask dataframe?

I have a situation where I need to index a dask dataframe by location. I see that there is not an .iloc method available. Is there an alternative? Or am I required to use label-based indexing?For …

How to deal with limitations of inspect.getsource - or how to get ONLY the source of a function?

I have been playing with the inspect module from Pythons standard library. The following examples work just fine (assuming that inspect has been imported):def foo(x, y):return x - y print(inspect.getso…

Checking whether a function is decorated

I am trying to build a control structure in a class method that takes a function as input and has different behaviors if a function is decorated or not. Any ideas on how you would go about building a f…

How to keep the script run after plt.show() [duplicate]

This question already has answers here:Is there a way to detach matplotlib plots so that the computation can continue?(21 answers)Closed 6 years ago.After the plt.show() , I just want to continue. How…