numpy.ndarray enumeration over a proper subset of the dimensions?

2024/5/20 15:31:39

(In this post, let np be shorthand for numpy.)

Suppose a is a (n + k)‑dimensional np.ndarray object, for some integers n > 1 and k > 1. (IOW, n + k > 3 is the value of a.ndim). I want to enumerate a over its first n dimensions; this means that, at each iteration, the enumerator/iterator produces a pair whose first element is a tuple ii of n indices, and second element is the k‑dimensional sub-ndarray at a[ii] .

Granted, it is not difficult to code a function to do this (in fact, I give an example of such a function below), but I want to know this:

does numpy provide any special syntax or functions for carrying out this type of "partial" enumeration?

(Normally, when I want to iterate over an multidimensional np.ndarray object, I use np.ndenumerate, but it wouldn't help here, because (as far as I can tell) np.ndenumerate would iterate over all n + k dimensions.)

Assuming that the answer to the question above is yes, then there's this follow-up:

what about the case where the n dimensions to iterate over are not contiguous?

(In this case, the first element of the pair returned at each iteration by the enumerator/iterator would be a tuple of r > n elements, some of which would be a special value denoting "all", e.g. slice(None); the second element of this pair would still be an ndarray of length k.)

Thanks!


The code below hopefully clarifies the problem specification. The function partial_enumerate does what I would like to do using any special numpy constructs available for the purpose. Following the definition of partial_enumerate is a simple example for the case n = k = 2.

import numpy as np
import itertools as it
def partial_enumerate(nda, n):"""Enumerate over the first N dimensions of the numpy.ndarray NDA.Returns an iterator of pairs.  The first element of each pair is a tuple of N integers, corresponding to a partial index I into NDA; the second elementis the subarray of NDA at I."""# ERROR CHECKING & HANDLING OMITTEDfor ii in it.product(*[range(d) for d in nda.shape[:n]]):yield ii, nda[ii]a = np.zeros((2, 3, 4, 5))
for ii, vv in partial_enumerate(a, 2):print ii, vv.shape

Each line of the output is a "pair of tuples", where the first tuple represents a partial set of n coordinates in a, and the second one represents the shape of the k‑dimensional subarray of a at those partial coordinates; (the value of this second pair is the same for all lines, as expected from the regularity of the array):

(0, 0) (4, 5)
(0, 1) (4, 5)
(0, 2) (4, 5)
(1, 0) (4, 5)
(1, 1) (4, 5)
(1, 2) (4, 5)

In contrast, iterating over np.ndenumerate(a) in this case would result in a.size iterations, each visiting an individual cell of a.

Answer

You can use the numpy broadcasting rules to generate a cartesian product. The numpy.ix_ function creates a list of the appropriate arrays. It's equivalent to the below:

>>> def pseudo_ix_gen(*arrays):
...     base_shape = [1 for arr in arrays]
...     for dim, arr in enumerate(arrays):
...         shape = base_shape[:]
...         shape[dim] = len(arr)
...         yield numpy.array(arr).reshape(shape)
... 
>>> def pseudo_ix_(*arrays):
...     return list(pseudo_ix_gen(*arrays))

Or, more concisely:

>>> def pseudo_ix_(*arrays):
...     shapes = numpy.diagflat([len(a) - 1 for a in arrays]) + 1
...     return [numpy.array(a).reshape(s) for a, s in zip(arrays, shapes)]

The result is a list of broadcastable arrays:

>>> numpy.ix_(*[[2, 4], [1, 3], [0, 2]])
[array([[[2]],[[4]]]), array([[[1],[3]]]), array([[[0, 2]]])]

Compare this to the result of numpy.ogrid:

>>> numpy.ogrid[0:2, 0:2, 0:2]
[array([[[0]],[[1]]]), array([[[0],[1]]]), array([[[0, 1]]])]

As you can see, it's the same, but numpy.ix_ allows you to use non-consecutive indices. Now when we apply the numpy broadcasting rules, we get a cartesian product:

>>> list(numpy.broadcast(*numpy.ix_(*[[2, 4], [1, 3], [0, 2]])))
[(2, 1, 0), (2, 1, 2), (2, 3, 0), (2, 3, 2), (4, 1, 0), (4, 1, 2), (4, 3, 0), (4, 3, 2)]

If, instead of passing the result of numpy.ix_ to numpy.broadcast, we use it to index an array, we get this:

>>> a = numpy.arange(6 ** 4).reshape((6, 6, 6, 6))
>>> a[numpy.ix_(*[[2, 4], [1, 3], [0, 2]])]
array([[[[468, 469, 470, 471, 472, 473],[480, 481, 482, 483, 484, 485]],[[540, 541, 542, 543, 544, 545],[552, 553, 554, 555, 556, 557]]],[[[900, 901, 902, 903, 904, 905],[912, 913, 914, 915, 916, 917]],[[972, 973, 974, 975, 976, 977],[984, 985, 986, 987, 988, 989]]]])

However, caveat emptor. Broadcastable arrays are useful for indexing, but if you literally want to enumerate the values, you might be better off using itertools.product:

>>> %timeit list(itertools.product(range(5), repeat=5))
10000 loops, best of 3: 196 us per loop
>>> %timeit list(numpy.broadcast(*numpy.ix_(*([range(5)] * 5))))
100 loops, best of 3: 2.74 ms per loop

So if you're incorporating a for loop anyway, then itertools.product will likely be faster. Still, you can use the above methods to get some similar data structures in pure numpy:

>> pgrid_idx = numpy.ix_(*[[2, 4], [1, 3], [0, 2]])
>>> sub_indices = numpy.rec.fromarrays(numpy.indices((6, 6, 6)))
>>> a[pgrid_idx].reshape((8, 6))
array([[468, 469, 470, 471, 472, 473],[480, 481, 482, 483, 484, 485],[540, 541, 542, 543, 544, 545],[552, 553, 554, 555, 556, 557],[900, 901, 902, 903, 904, 905],[912, 913, 914, 915, 916, 917],[972, 973, 974, 975, 976, 977],[984, 985, 986, 987, 988, 989]])
>>> sub_indices[pgrid_idx].reshape((8,))
rec.array([(2, 1, 0), (2, 1, 2), (2, 3, 0), (2, 3, 2), (4, 1, 0), (4, 1, 2), (4, 3, 0), (4, 3, 2)], dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
https://en.xdnf.cn/q/72980.html

Related Q&A

How can I work with a GLib.Array in Python?

I am writing a plugin for Rhythmbox, wherein a signal raised is passing in an object of type GArray. The documentation for GLib Arrays shows me a few methods I am interested in, but am unable to acces…

Categorical dtype changes after using melt

In answering this question, I found that after using melt on a pandas dataframe, a column that was previously an ordered Categorical dtype becomes an object. Is this intended behaviour?Note: not looki…

python apscheduler not consistent

Im running a scheduler using python apscheduler inside web.py framework. The function runserver is supposed to run everyday at 9 a.m but it is inconsistent. It runs most days but skips a day once in a …

Change timezone info for multiple datetime columns in pandas

Is there a easy way of converting all timestamp columns in a dataframe to local/any timezone? Not by doing it column by column?

Change permissions via ftp in python

Im using python with ftplib to upload images to a folder on my raspberryPi located in /var/www. Everything is working fine except that uploaded files have 600 permissions and I need 644 for them.Which …

Creating a Persistent Data Object In Django

I have a Python-based maximum entropy classifier. Its large, stored as a Pickle, and takes about a minute to unserialize. Its also not thread safe. However, it runs fast and can classify a sample (a si…

How to catch specific exceptions on sqlalchemy?

I want to catch specific exceptions like UniqueViolation on sqlalchemy.But sqlalchemy throw exceptions only through IntegrityError.So I catched specific exceptions with below code.except sqlalchemy.exc…

numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares on first run only

I have been wrestling with a known and documented SVD converge issue. Having read up on similar issues raised by others, I have double checked my data and reduced this to a tiny DataFrame - 10 rows/2 c…

Seaborn kde plot plotting probabilities instead of density (histplot without bars)

I have a question about seaborn kdeplot. In histplot one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde argument, it also applies to the kde…

How can I improve my code for euler 14?

I solved Euler problem 14 but the program I used is very slow. I had a look at what the others did and they all came up with elegant solutions. I tried to understand their code without much success.Her…