How to clear GPU memory after PyTorch model training without restarting kernel

2024/11/20 19:37:43

I am training PyTorch deep learning models on a Jupyter-Lab notebook, using CUDA on a Tesla K80 GPU to train. While doing training iterations, the 12 GB of GPU memory are used. I finish training by saving the model checkpoint, but want to continue using the notebook for further analysis (analyze intermediate results, etc.).

However, these 12 GB continue being occupied (as seen from nvtop) after finishing training. I would like to free up this memory so that I can use it for other notebooks.

My solution so far is to restart this notebook's kernel, but that is not solving my issue because I can't continue using the same notebook and its respective output computed so far.

Answer

The answers so far are correct for the Cuda side of things, but there's also an issue on the ipython side of things.

When you have an error in a notebook environment, the ipython shell stores the traceback of the exception so you can access the error state with %debug. The issue is that this requires holding all variables that caused the error to be held in memory, and they aren't reclaimed by methods like gc.collect(). Basically all your variables get stuck and the memory is leaked.

Usually, causing a new exception will free up the state of the old exception. So trying something like 1/0 may help. However things can get weird with Cuda variables and sometimes there's no way to clear your GPU memory without restarting the kernel.

For more detail see these references:

https://github.com/ipython/ipython/pull/11572

How to save traceback / sys.exc_info() values in a variable?

https://en.xdnf.cn/q/26286.html

Related Q&A

cryptography is required for sha256_password or caching_sha2_password

Good day. Hope your all are well. Can someone help me with fix this? Im new to the MySQL environment. Im trying to connect to MySQL Database remotely. I used the following python code and got this err…

Django: How to access original (unmodified) instance in post_save signal

I want to do a data denormalization for better performance, and put a sum of votes my blog post receives inside Post model:class Post(models.Model):""" Blog entry """autho…

timeit and its default_timer completely disagree

I benchmarked these two functions (they unzip pairs back into source lists, came from here): n = 10**7 a = list(range(n)) b = list(range(n)) pairs = list(zip(a, b))def f1(a, b, pairs):a[:], b[:] = zip(…

How to pickle and unpickle to portable string in Python 3

I need to pickle a Python3 object to a string which I want to unpickle from an environmental variable in a Travis CI build. The problem is that I cant seem to find a way to pickle to a portable string …

Using RabbitMQ is there a way to look at the queue contents without a dequeue operation?

As a way to learn RabbitMQ and python Im working on a project that allows me to distribute h264 encodes between a number of computers. The basics are done, I have a daemon that runs on Linux or Mac th…

Pip default behavior conflicts with virtualenv?

I was following this tutorial When I got to virtualenv flask command, I received this error message: Can not perform a --user install. User site-packages are not visible in this virtualenv.This makes s…

How to update user password in Django Rest Framework?

I want to ask that following code provides updating password but I want to update password after current password confirmation process. So what should I add for it? Thank you.class UserPasswordSeriali…

ImportError: No module named xlrd

I am currently using PyCharm with Python version 3.4.3 for this particular project.This PyCharm previously had Python2.7, and I upgraded to 3.4.3.I am trying to fetch data from an Excel file using Pand…

How do I automatically fix lint issues reported by pylint?

Just like we have "eslint --fix" to automatically fix lint problems in Javascript code, do we have something for pylint too for Python code?

Conversion from JavaScript to Python code? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…