Why does comparison of a numpy array with a list consume so much memory?

2024/10/4 23:20:38

This bit stung me recently. I solved it by removing all comparisons of numpy arrays with lists from the code. But why does the garbage collector miss to collect it?

Run this and watch it eat your memory:

import numpy as np
r = np.random.rand(2)   
l = []
while True:r == l

Running on 64bit Ubuntu 10.04, virtualenv 1.7.2, Python 2.7.3, Numpy 1.6.2

Answer

Just in case someone stumbles on this and wonders...

@Dugal yes, I believe this is a memory leak in current numpy versions (Sept. 2012) that occurs when some Exceptions are raised (see this and this). Why adding the gc call that @BiRico did "fixes" it seems weird to me, though it must be done right after appearently? Maybe its an oddity with how python garbage collects tracebacks, if someone knows the Exception handling and garbage colleciton CPython Internals, I would be interested.

Workaround: This is not directly related to lists, but for example most broadcasting Exceptions (the empty list does not fit to the arrays size, an empty array results in the same leak. Note that internally there is an Exception prepared that never surfaces). So as a workaround, you should probably just check first if the shape is correct (if you do it a lot, otherwise I wouldn't worry really, this leaks just a small string if I got it right).

FIXED: This issue will be fixed with numpy 1.7.

https://en.xdnf.cn/q/70555.html

Related Q&A

StringIO portability between python2 and python3 when capturing stdout

I have written a python package which I have managed to make fully compatible with both python 2.7 and python 3.4, with one exception that is stumping me so far. The package includes a command line scr…

How to redirect data to a getpass like password input?

Im wring a python script for running some command. Some of those commands require user to input password, I did try to input data in their stdin, but it doesnt work, here is two simple python program…

How to grab one random item from a database in Django/postgreSQL?

So i got the database.objects.all() and database.objects.get(name) but how would i got about getting one random item from the database. Im having trouble trying to figure out how to get it ot select on…

Pyspark Dataframe pivot and groupby count

I am working on a pyspark dataframe which looks like belowid category1 A1 A1 B2 B2 A3 B3 B3 BI want to unstack the category column and count their occurrences. So, the result I want is shown belowid A …

Create an excel file from BytesIO using python

I am using pandas library to store excel into bytesIO memory. Later, I am storing this bytesIO object into SQL Server as below-df = pandas.DataFrame(data1, columns=[col1, col2, col3])output = BytesIO()…

python send csv data to spark streaming

I would like to try and load a csv data in python and stream each row spark via SPark Streaming.Im pretty new to network stuff. Im not exactly if Im supposed to create a server python script that once …

Python string representation of binary data

Im trying to understand the way Python displays strings representing binary data.Heres an example using os.urandomIn [1]: random_bytes = os.urandom(4)In [2]: random_bytes Out[2]: \xfd\xa9\xbe\x87In [3]…

Combining Spark Streaming + MLlib

Ive tried to use a Random Forest model in order to predict a stream of examples, but it appears that I cannot use that model to classify the examples. Here is the code used in pyspark:sc = SparkContext…

How to select dataframe rows according to multi-(other column)-condition on columnar groups?

Copy the following dataframe to your clipboard:textId score textInfo 0 name1 1.0 text_stuff 1 name1 2.0 different_text_stuff 2 name1 2.0 text_stuff …

Python Recursive Search of Dict with Nested Keys

I recently had to solve a problem in a real data system with a nested dict/list combination. I worked on this for quite a while and came up with a solution, but I am very unsatisfied. I had to resort t…