how to split numpy array and perform certain actions on split arrays [Python]

2024/9/24 21:23:43

Only part of this question has been asked before ([1][2]) , which explained how to split numpy arrays. I am quite new in Python. I have an array containing 262144 items and want to split it in small arrays of a length of 512, sort them individually and sum up their first five values but I am unsure how beyond this line :

np.array_split(vector, 512)

How do I call and analyse each array ? Would it be good idea to continue to use numpy array or should I revert back and use dictionary instead ?

Answer

Splitting as such won't be an efficient solution, instead we could reshape, which effectively creates subarrays as rows of a 2D array. These would be views into the input array, so no additional memory requirement there. Then, we would get argsort indices and select first five indices per row and finally sum those up for the desired output.

Thus, we would have an implementation like so -

N = 512 # Number of elements in each split array
M = 5   # Number of elements in each subarray for sorting and summingb = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)

Step-by-step sample run -

In [217]: a   # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])In [218]: N = 7 # 512 for original case, 7 for sampleIn [219]: M = 5# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)In [224]: b
Out[224]: 
array([[45, 19, 71, 53, 20, 33, 31],[20, 41, 19, 38, 31, 86, 34]])# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]: 
array([[1, 4, 6, 5, 0, 3, 2],[2, 0, 4, 6, 3, 1, 5]])# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]: 
array([[1, 4, 6, 5, 0],[2, 0, 4, 6, 3]])# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]: 
array([[19, 20, 31, 33, 45],[19, 20, 31, 34, 38]])# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])

Performance boost with np.argpartition -

out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)

Runtime test -

In [236]: a = np.random.randint(11,99,(512*512))In [237]: N = 512In [238]: M = 5In [239]: b = a.reshape(-1,N)In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loopIn [241]: %timeit b[np.arange(b.shape[0])[:,None], \np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop
https://en.xdnf.cn/q/71656.html

Related Q&A

NLTK was unable to find the java file! for Stanford POS Tagger

I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code:stanford_dir = C:/Users/.../stanford-postagger-2017-06-09…

Append a list in Google Sheet from Python

I have a list in Python which I simply want to write (append) in the first column row-by-row in a Google Sheet. Im done with all the initial authentication part, and heres the code:credentials = Google…

Compute linear regression standardized coefficient (beta) with Python

I would like to compute the beta or standardized coefficient of a linear regression model using standard tools in Python (numpy, pandas, scipy.stats, etc.).A friend of mine told me that this is done in…

Individually labeled bars for bar graph in Plotly

I was trying to create annotations for grouped bar charts - where each bar has a specific data label that shows the value of that bar and is located above the centre of the bar.I tried a simple modific…

Is there a way to subclass a generator in Python 3?

Aside from the obvious, I thought Id try this, just in case:def somegen(input=None):...yield...gentype = type(somegen()) class subgen(gentype):def best_function_ever():...Alas, Pythons response was qui…

represent binary search trees in python

how do i represent binary search trees in python?

Python os.path.commonprefix - is there a path oriented function?

So I have this python code:print os.path.commonprefix([rC:\root\dir,rC:\root\dir1])Real ResultC:\root\dirDesired resultC:\rootQuestion 1Based on os.path.commonprefix documentation: Return the longest p…

Importing Stripe into Django - NameError

I cant seem to figure out how to import Stripe into my Django project. Im running Python 2.7.3 and I keep receiving NameError at /complete/ global name. stripe is not defined.Even when I just open up T…

getting line-numbers that were changed

Given two text files A,B, what is an easy way to get the line numbers of lines in B not present in A? I see theres difflib, but dont see an interface for retrieving line numbers

How to subclass a subclass of numpy.ndarray

Im struggling to subclass my own subclass of numpy.ndarray. I dont really understand what the problem is and would like someone to explain what goes wrong in the following cases and how to do what Im t…