Count unique elements along an axis of a NumPy array

2024/10/6 6:52:31

I have a three-dimensional array like

A=np.array([[[1,1],
[1,0]],[[1,2],
[1,0]],[[1,0],
[0,0]]])

Now I would like to obtain an array that has a nonzero value in a given position if only a unique nonzero value (or zero) occurs in that position. It should have zero if only zeros or more than one nonzero value occur in that position. For the example above, I would like

[[1,0],
[1,0]]

since

  • in A[:,0,0] there are only 1s
  • in A[:,0,1] there are 0, 1 and 2, so more than one nonzero value
  • in A[:,1,0] there are 0 and 1, so 1 is retained
  • in A[:,1,1] there are only 0s

I can find how many nonzero elements there are with np.count_nonzero(A, axis=0), but I would like to keep 1s or 2s even if there are several of them. I looked at np.unique but it doesn't seem to support what I'd like to do.

Ideally, I'd like a function like np.count_unique(A, axis=0) which would return an array in the original shape, e.g. [[1, 3],[2, 1]], so I could check whether 3 or more occur and then ignore that position.


All I could come up with was a list comprehension iterating over the that I'd like to obtain

[[len(np.unique(A[:, i, j])) for j in range(A.shape[2])] for i in range(A.shape[1])]

Any other ideas?

Answer

You can use np.diff to stay at numpy level for the second task.

def diffcount(A):B=A.copy()B.sort(axis=0)C=np.diff(B,axis=0)>0D=C.sum(axis=0)+1return D# [[1 3]
#  [2 1]]

it's seems to be a little faster on big arrays:

In [62]: A=np.random.randint(0,100,(100,100,100))In [63]: %timeit diffcount(A)
46.8 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)In [64]: timeit [[len(np.unique(A[:, i, j])) for j in range(A.shape[2])]\
for i in range(A.shape[1])]
149 ms ± 700 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Finally counting unique is simpler than sorting, a ln(A.shape[0]) factor can be win.

A way to win this factor is to use the set mechanism :

In [81]: %timeit np.apply_along_axis(lambda a:len(set(a)),axis=0,A) 
183 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Unfortunately, this is not faster.

Another way is to do it by hand :

def countunique(A,Amax):res=np.empty(A.shape[1:],A.dtype)c=np.empty(Amax+1,A.dtype)for i in range(A.shape[1]):for j in range(A.shape[2]):T=A[:,i,j]for k in range(c.size): c[k]=0 for x in T:c[x]=1res[i,j]= c.sum()return res 

At python level:

In [70]: %timeit countunique(A,100)
429 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Which is not so bad for a pure python approach. Then just shift this code at low level with numba :

import numba    
countunique2=numba.jit(countunique)  In [71]: %timeit countunique2(A,100)
3.63 ms ± 70.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Which will be difficult to improve a lot.

https://en.xdnf.cn/q/70397.html

Related Q&A

influxdb python: 404 page not found

I am trying to use the influxdb-python lib which I found here. But I cant even get the tutorial programm to work. When I run the following example code:$ python>>> from influxdb import InfluxD…

Django Table already exist

Here is my Django Migration file. When I run python manage.py makemigrations/migrate I get this error.Error:-django.db.utils.OperationalError: (1050, "Table tickets_duration already exists")I…

Python round() too slow, faster way to reduce precision?

I am doing the following:TOLERANCE = 13 some_float = ... round(some_float, TOLERANCE)This is run many times, so performance is important. I have to round some_float due to floating point representation…

Reading .doc file in Python using antiword in Windows (also .docx)

I tried reading a .doc file like - with open(file.doc, errors=ignore) as f:text = f.read()It did read that file but with huge junk, I cant remove that junk as I dont know from where it starts and where…

Error installing package with pip

Im trying to install a charting tool (matplotlib-v1.4.2) for python 3.4 in Windows 7, so far all my trails doesnt seem to do the job.Attempts:Ive downloaded pip from GitHub python -m pip install matplo…

Assign new values to certain tensor elements in Keras

I need to change the value of some elements of a tensor. I know what elements -- they are in a boolean tensor already.I dont see how to do this in keras code. But if I were using TensorFlow code I woul…

Making grid triangular mesh quickly with Numpy

Consider a regular matrix that represents nodes numbered as shown in the figure:I want to make a list with all the triangles represented in the figure. Which would result in the following 2 dimensional…

df [X].unique() and TypeError: unhashable type: numpy.ndarray

all,I have a column in a dataframe that looks like this:allHoldingsFund[BrokerMixed] Out[419]: 78 ML 81 CITI 92 ML 173 CITI 235 ML 262 ML 264 ML 25617 …

Python pandas idxmax for multiple indexes in a dataframe

I have a series that looks like this:delivery 2007-04-26 706 23 2007-04-27 705 10706 1089708 83710 13712 51802 4806 181…

No of Pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion

How to find number of pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion (using any programming language and without using any external libraries) with consideri…