Question 1

I have a three-dimensional array like

A=np.array([[[1,1],
[1,0]],[[1,2],
[1,0]],[[1,0],
[0,0]]])

Now I would like to obtain an array that has a nonzero value in a given position if only a unique nonzero value (or zero) occurs in that position. It should have zero if only zeros or more than one nonzero value occur in that position. For the example above, I would like

[[1,0],
[1,0]]

since

in A[:,0,0] there are only 1s
in A[:,0,1] there are 0, 1 and 2, so more than one nonzero value
in A[:,1,0] there are 0 and 1, so 1 is retained
in A[:,1,1] there are only 0s

I can find how many nonzero elements there are with np.count_nonzero(A, axis=0), but I would like to keep 1s or 2s even if there are several of them. I looked at np.unique but it doesn't seem to support what I'd like to do.

Ideally, I'd like a function like np.count_unique(A, axis=0) which would return an array in the original shape, e.g. [[1, 3],[2, 1]], so I could check whether 3 or more occur and then ignore that position.

All I could come up with was a list comprehension iterating over the that I'd like to obtain

[[len(np.unique(A[:, i, j])) for j in range(A.shape[2])] for i in range(A.shape[1])]

Any other ideas?

Question 2

You can use np.diff to stay at numpy level for the second task.

def diffcount(A):B=A.copy()B.sort(axis=0)C=np.diff(B,axis=0)>0D=C.sum(axis=0)+1return D# [[1 3]
#  [2 1]]

it's seems to be a little faster on big arrays:

In [62]: A=np.random.randint(0,100,(100,100,100))In [63]: %timeit diffcount(A)
46.8 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)In [64]: timeit [[len(np.unique(A[:, i, j])) for j in range(A.shape[2])]\
for i in range(A.shape[1])]
149 ms ± 700 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Finally counting unique is simpler than sorting, a ln(A.shape[0]) factor can be win.

A way to win this factor is to use the set mechanism :

In [81]: %timeit np.apply_along_axis(lambda a:len(set(a)),axis=0,A) 
183 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Unfortunately, this is not faster.

Another way is to do it by hand :

def countunique(A,Amax):res=np.empty(A.shape[1:],A.dtype)c=np.empty(Amax+1,A.dtype)for i in range(A.shape[1]):for j in range(A.shape[2]):T=A[:,i,j]for k in range(c.size): c[k]=0 for x in T:c[x]=1res[i,j]= c.sum()return res

At python level:

In [70]: %timeit countunique(A,100)
429 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Which is not so bad for a pure python approach. Then just shift this code at low level with numba :

import numba    
countunique2=numba.jit(countunique)  In [71]: %timeit countunique2(A,100)
3.63 ms ± 70.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Which will be difficult to improve a lot.

Count unique elements along an axis of a NumPy array

Related Q&A

influxdb python: 404 page not found

Django Table already exist

Python round() too slow, faster way to reduce precision?

Reading .doc file in Python using antiword in Windows (also .docx)

Error installing package with pip

Assign new values to certain tensor elements in Keras

Making grid triangular mesh quickly with Numpy

df [X].unique() and TypeError: unhashable type: numpy.ndarray

Python pandas idxmax for multiple indexes in a dataframe

No of Pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion