Only part of this question has been asked before ([1][2]) , which explained how to split numpy arrays. I am quite new in Python. I have an array containing 262144 items and want to split it in small arrays of a length of 512, sort them individually and sum up their first five values but I am unsure how beyond this line :
np.array_split(vector, 512)
How do I call and analyse each array ? Would it be good idea to continue to use numpy array or should I revert back and use dictionary instead ?
Splitting as such won't be an efficient solution, instead we could reshape, which effectively creates subarrays as rows of a 2D
array. These would be views into the input array, so no additional memory requirement there. Then, we would get argsort indices and select first five indices per row and finally sum those up for the desired output.
Thus, we would have an implementation like so -
N = 512 # Number of elements in each split array
M = 5 # Number of elements in each subarray for sorting and summingb = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Step-by-step sample run -
In [217]: a # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])In [218]: N = 7 # 512 for original case, 7 for sampleIn [219]: M = 5# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)In [224]: b
Out[224]:
array([[45, 19, 71, 53, 20, 33, 31],[20, 41, 19, 38, 31, 86, 34]])# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]:
array([[1, 4, 6, 5, 0, 3, 2],[2, 0, 4, 6, 3, 1, 5]])# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]:
array([[1, 4, 6, 5, 0],[2, 0, 4, 6, 3]])# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]:
array([[19, 20, 31, 33, 45],[19, 20, 31, 34, 38]])# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])
Performance boost with np.argpartition
-
out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)
Runtime test -
In [236]: a = np.random.randint(11,99,(512*512))In [237]: N = 512In [238]: M = 5In [239]: b = a.reshape(-1,N)In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loopIn [241]: %timeit b[np.arange(b.shape[0])[:,None], \np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop