How do I find what is using memory in a Python process in a production system?

2024/11/20 15:35:26

My production system occasionally exhibits a memory leak I have not been able to reproduce in a development environment. I've used a Python memory profiler (specifically, Heapy) with some success in the development environment, but it can't help me with things I can't reproduce, and I'm reluctant to instrument our production system with Heapy because it takes a while to do its thing and its threaded remote interface does not work well in our server.

What I think I want is a way to dump a snapshot of the production Python process (or at least gc.get_objects), and then analyze it offline to see where it is using memory. How do I get a core dump of a python process like this? Once I have one, how do I do something useful with it?

Answer

Using Python's gc garbage collector interface and sys.getsizeof() it's possible to dump all the python objects and their sizes. Here's the code I'm using in production to troubleshoot a memory leak:

rss = psutil.Process(os.getpid()).get_memory_info().rss
# Dump variables if using more than 100MB of memory
if rss > 100 * 1024 * 1024:memory_dump()os.abort()def memory_dump():dump = open("memory.pickle", 'wb')xs = []for obj in gc.get_objects():i = id(obj)size = sys.getsizeof(obj, 0)#    referrers = [id(o) for o in gc.get_referrers(obj) if hasattr(o, '__class__')]referents = [id(o) for o in gc.get_referents(obj) if hasattr(o, '__class__')]if hasattr(obj, '__class__'):cls = str(obj.__class__)xs.append({'id': i, 'class': cls, 'size': size, 'referents': referents})cPickle.dump(xs, dump)

Note that I'm only saving data from objects that have a __class__ attribute because those are the only objects I care about. It should be possible to save the complete list of objects, but you will need to take care choosing other attributes. Also, I found that getting the referrers for each object was extremely slow so I opted to save only the referents. Anyway, after the crash, the resulting pickled data can be read back like this:

with open("memory.pickle", 'rb') as dump:objs = cPickle.load(dump)

Added 2017-11-15

The Python 3.6 version is here:

import gc
import sys
import _pickle as cPickledef memory_dump():with open("memory.pickle", 'wb') as dump:xs = []for obj in gc.get_objects():i = id(obj)size = sys.getsizeof(obj, 0)#    referrers = [id(o) for o in gc.get_referrers(obj) if hasattr(o, '__class__')]referents = [id(o) for o in gc.get_referents(obj) if hasattr(o, '__class__')]if hasattr(obj, '__class__'):cls = str(obj.__class__)xs.append({'id': i, 'class': cls, 'size': size, 'referents': referents})cPickle.dump(xs, dump)
https://en.xdnf.cn/q/26305.html

Related Q&A

In Django is there a way to display choices as checkboxes?

In the admin interface and newforms there is the brilliant helper of being able to define choices. You can use code like this:APPROVAL_CHOICES = ((yes, Yes),(no, No),(cancelled, Cancelled), )client_app…

How to get the first 2 letters of a string in Python?

Lets say I have a string str1 = "TN 81 NZ 0025" two = first2(str1) print(two) # -> TNHow do I get the first two letters of this string? I need the first2 function for this.

Python sqlite3 version

Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >…

Python debugger (pdb) stopped handlying up/down arrows, shows ^[[A instead

I am using python 2.6 in a virtualenv on an Ubuntu Linux 11.04 (natty) machine. I have this code in my (django) python code:import pdb ; pdb.set_trace()in order to launch the python debugger (pdb).Up u…

Un-persisting all dataframes in (py)spark

I am a spark application with several points where I would like to persist the current state. This is usually after a large step, or caching a state that I would like to use multiple times. It appears …

Can pip (or setuptools, distribute etc...) list the license used by each installed package?

Im trying to audit a Python project with a large number of dependencies and while I can manually look up each projects homepage/license terms, it seems like most OSS packages should already contain the…

Convert DataFrameGroupBy object to DataFrame pandas

I had a dataframe and did a groupby in FIPS and summed the groups that worked fine.kl = ks.groupby(FIPS)kl.aggregate(np.sum)I just want a normal Dataframe back but I have a pandas.core.groupby.DataFram…

Correct way to obtain confidence interval with scipy

I have a 1-dimensional array of data:a = np.array([1,2,3,4,4,4,5,5,5,5,4,4,4,6,7,8])for which I want to obtain the 68% confidence interval (ie: the 1 sigma).The first comment in this answer states that…

How to supply a mock class method for python unit test?

Lets say I have a class like this. class SomeProductionProcess(CustomCachedSingleTon):@classmethoddef loaddata(cls):"""Uses an iterator over a large file in Production for the Data pipel…

View pdf image in an iPython Notebook

The following code allows me to view a png image in an iPython notebook. Is there a way to view pdf image? I dont need to use IPython.display necessarily. I am looking for a way to print a pdf image i…