Concatenate Numpy arrays with least memory

2024/9/8 10:25:32

Not I have 50GB dataset saved as h5py, which is a dictionary inside. The dictionary contains keys from 0 to n, and the values are numpy ndarray(3 dimension) which have the same shape. For example:

dictionary[0] = np.array([[[...],[...]]...])

I want to concat all these np arrays, code like

sample = np.concatenate(list(dictionary.values))

this operation waste 100GB memory! If I use

del dictionary

It will decrease to 50GB memory. But I want to control the memory usage as 50GB during loading data. Another way I tried like this

    sample = np.concatenate(sample,dictionary[key])

It is still using 100GB memory. I think all the cases above, the right side will create a new memory block to save, and then assigned to the left side, which will double the memory during calculations. Thus, the third way I tried like this

sample = np.empty(shape)
with h5py.File(...) as dictionary:for key in dictionary.keys():sample[key] = dictionary[key]

I think this code has an advantage. The value dictionary[key] assigned to some row of sample, then the memory of dictionary[key] will clear. However, I test it and find that the memory usage is also 100GB. Why?

Are there any good methods to limit the memory usage as 50GB?

Answer

Your problem is that you need to have 2 copies of the same data in memory. If you build the array as in test1 you'll need far less memory at once, but at the cost of losing the dictionary.

import numpy as np
import time    def test1(n):a = {x:(x, x, x) for x in range(n)} # Build sample datab = np.array([a.pop(i) for i in range(n)]).reshape(-1)return bdef test2(n):a = {x:(x, x, x) for x in range(n)} # Build sample datab = np.concatenate(list(a.values()))return bx1 = test1(1000000)
del x1time.sleep(1)x2 = test2(1000000)

Results:

enter image description here

test1 : 0.71 s
test2 : 1.39 s

The first peek is for test1, it's not exactly in place but it reduces the memory usage quite a bit.

https://en.xdnf.cn/q/72883.html

Related Q&A

How to generate random programs from BNF

I know my question sounds a little vague, but I could not find any tutorials online. I am not asking for an answer, but for more of an explanation. An example of the BNF:<prog> ::= “int main() {…

Pandas: merge multiple dataframes and control column names?

I would like to merge nine Pandas dataframes together into a single dataframe, doing a join on two columns, controlling the column names. Is this possible?I have nine datasets. All of them have the fo…

Two different plots from same loop in matplotlib?

I would specifically like to create two different plots using one single loop. One plot should have four straight lines from x-y, and another plot should have four angled lines from x-y2. My code only …

Matplotlib text alignment

Is there a way to get the result shown in the third axes with just a single ax.text() command? Using expandtabs almost get me there, but the text never aligns properly. Using two plotting commands doe…

Pandas cannot load data, csv encoding mystery

I am trying to load a dataset into pandas and cannot get seem to get past step 1. I am new so please forgive if this is obvious, I have searched previous topics and not found an answer. The data is mos…

How to read in an edge list to make a scipy sparse matrix

I have a large file where each line has a pair of 8 character strings. Something like:ab1234gh iu9240ghon each line.This file really represents a graph and each string is a node id. I would like to r…

How can I find the best fuzzy string match?

Pythons new regex module supports fuzzy string matching. Sing praises aloud (now). Per the docs:The ENHANCEMATCH flag makes fuzzy matching attempt to improve the fitof the next match that it finds.The …

how to write a unicode csv in Python 2.7

I want to write data to files where a row from a CSV should look like this list (directly from the Python console):row = [\xef\xbb\xbft_11651497, http://kozbeszerzes.ceu.hu/entity/t/11651497.xml, "…

Terminating QThread gracefully on QDialog reject()

I have a QDialog which creates a QThread to do some work while keeping the UI responsive, based on the structure given here: How To Really, Truly Use QThreads; The Full Explanation. However, if reject(…

Python descriptors with old-style classes

I tried to google something about it. Why do non-data descriptors work with old-style classes?Docs say that they should not: "Note that descriptors are only invoked for new style objects or class…