What is pythons strategy to manage allocation/freeing of large variables?

2024/9/28 3:18:36

As a follow-up to this question, it appears that there are different allocation/deallocation strategies for little and big variables in (C)Python.
More precisely, there seems to be a boundary in the object size above which the memory used by the allocated object can be given back to the OS. Below this size, the memory is not given back to the OS.

To quote the answer taken from the Numpy policy for releasing memory:

The exception is that for large single allocations (e.g. if you create a multi-megabyte array), a different mechanism is used. Such large memory allocations can be released back to the OS. So it might specifically be the non-numpy parts of your program that are producing the issues you see.

Indeed, these two allocations strategies are easy to show. For example:

  • 1st strategy: no memory is given back to the OS
import numpy as np
import psutil
import gc# Allocate  array
x = np.random.uniform(0,1, size=(10**4))# gc
del x
gc.collect()
# We go from 41295.872 KB to 41295.872 KB
# using psutil.Process().memory_info().rss / 10**3; same behavior for VMS

=> No memory given back to the OS

  • 2nd strategy: freed memory is given back to the OS

When doing the same experiment, but with a bigger array:

x = np.random.uniform(0,1, size=(10**5))del x
gc.collect()
# We go from 41582.592 KB to 41017.344 KB

=> Memory is released to the OS

It seems that objects approximately bigger than 8*10**4 bytes get allocated using the 2nd strategy.

So:

  • Is this behavior documented? (And what is the exact boundary at which the allocation strategy changes?)
  • What are the internals of these strategies (more than assuming the use of an mmap/munmap to release the memory back to the OS)
  • Is this 100% done by the Python runtime or does Numpy have a specific way of handling this? (The numpy doc mentions the NPY_USE_PYMEM that switches between the memory allocator)
Answer

What you observe isn't CPython's strategy, but the strategy of the memory allocator which comes with the C-runtime your CPython-version is using.

When CPython allocates/deallocates memory via malloc/free, it doesn't not communicate directly with the underlying OS, but with a concrete implementation of memory allocator. In my case on Linux, it is the GNU Allocator.

The GNU Allocator has different so called arenas, where the memory isn't returned to OS, but kept so it can be reused without the need to comunicate with OS. However, if a large amout of memory is requested (whatever the definition of "large"), the allocator doesn't use the memory from arenas but requests the memory from OS and as consequence can give it directly back to OS, once free is called.


CPython has its own memory allocator - pymalloc, which is built atop of the C-runtime-allocator. It is optimized for small objects, which live in a special arena; there is less overhead when creating/freeing these objects as compared to the underlying C-runtime-allocator. However, objects bigger than 512 bytes don't use this arena, but are managed directly by the C-runtime-allocator.

The situation is even more complex with numpy's array, because different memory-allocators are used for the meta-data (like shape, datatype and other flags) and for the the actual data itself:

  1. For meta-data PyArray_malloc, the CPython's memory allocator (i.e. pymalloc) is used.
  2. For data itself, PyDataMem_NEW is used, which utilzes the underlying C-runtimme-functionality directly:
NPY_NO_EXPORT void *
PyDataMem_NEW(size_t size)
{void *result;result = malloc(size);...return result;
}

I'm not sure, what was the exact idea behind this design: obviously one would like to prifit from small object optimization of pymalloc, and for data this optimization would never work, but then one could use PyMem_RawMalloc instead of malloc. Maybe the goal was to be able to wrap numpy arrays around memory allocated by C-routines and take over the ownership of memory (but this will not work in some circumstances, see my comment at the end of this post).

This explains the behavior you are observing: For data (whose size is changing depending on the passed size-argument in) PyDataMem_NEW is used, which bypasses CPython's memory allocator and you see the original behavior of C-runtime's allocators.


One should try to avoid to mix different allocations/deallocations routines PyArray_malloc/PyDataMem_NEW'/mallocandPyArray_free/PyDataMem_FREE/free`: even if it works at OS+Python version at hand, it might fail for another combinations.

For example on Windows, when an extension is built with a different compiler version, one executable might have different memory allocators from different C-run-times and malloc/free might communicate with different C-memory-allocators, which could lead to hard to track down errors.

https://en.xdnf.cn/q/71390.html

Related Q&A

Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

Running locally on a Jupyter notebook and using the MNIST dataset (28k entries, 28x28 pixels per image, the following takes 27 seconds. from sklearn.neighbors import KNeighborsClassifierknn_clf = KNeig…

Do I need to do any text cleaning for Spacy NER?

I am new to NER and Spacy. Trying to figure out what, if any, text cleaning needs to be done. Seems like some examples Ive found trim the leading and trailing whitespace and then muck with the start/st…

Hi , I have error related to object detection project

I have error related to simple object detection .output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] IndexError: invalid index to scalar variable.import cv2.cv2 as cv import…

What is the fastest way to calculate / create powers of ten?

If as the input you provide the (integer) power, what is the fastest way to create the corresponding power of ten? Here are four alternatives I could come up with, and the fastest way seems to be usin…

How to disable date interpolation in matplotlib?

Despite trying some solutions available on SO and at Matplotlibs documentation, Im still unable to disable Matplotlibs creation of weekend dates on the x-axis.As you can see see below, it adds dates to…

Continuous error band with Plotly Express in Python [duplicate]

This question already has answers here:Plotly: How to make a figure with multiple lines and shaded area for standard deviations?(5 answers)Closed 2 years ago.I need to plot data with continuous error …

How to preprocess training set for VGG16 fine tuning in Keras?

I have fine tuned the Keras VGG16 model, but Im unsure about the preprocessing during the training phase.I create a train generator as follow:train_datagen = ImageDataGenerator(rescale=1./255) train_ge…

Using Python like PHP in Apache/Windows

I understand that I should use mod_wsgi to run Python, and I have been trying to get that set up, but Im confused about it:This is a sample configuration I found for web.py:LoadModule wsgi_module modul…

django-oauth-toolkit : Customize authenticate response

I am new to Django OAuth Toolkit. I want to customize the authenticate response.My authenticate url configuration on django application is : url(authenticate/,include(oauth2_provider.urls, namespace=oa…

Pushing local branch to remote branch

I created new repository in my Github repository.Using the gitpython library Im able to get this repository. Then I create new branch, add new file, commit and try to push to the new branch.Please chec…