I have a three-dimensional array like
A=np.array([[[1,1],
[1,0]],[[1,2],
[1,0]],[[1,0],
[0,0]]])
Now I would like to obtain an array that has a nonzero value in a given position if only a unique nonzero value (or zero) occurs in that position. It should have zero if only zeros or more than one nonzero value occur in that position. For the example above, I would like
[[1,0],
[1,0]]
since
- in
A[:,0,0]
there are only 1
s
- in
A[:,0,1]
there are 0
, 1
and 2
, so more than one nonzero value
- in
A[:,1,0]
there are 0
and 1
, so 1
is retained
- in
A[:,1,1]
there are only 0
s
I can find how many nonzero elements there are with np.count_nonzero(A, axis=0)
, but I would like to keep 1
s or 2
s even if there are several of them. I looked at np.unique
but it doesn't seem to support what I'd like to do.
Ideally, I'd like a function like np.count_unique(A, axis=0)
which would return an array in the original shape, e.g. [[1, 3],[2, 1]]
, so I could check whether 3 or more occur and then ignore that position.
All I could come up with was a list comprehension iterating over the that I'd like to obtain
[[len(np.unique(A[:, i, j])) for j in range(A.shape[2])] for i in range(A.shape[1])]
Any other ideas?
You can use np.diff
to stay at numpy level for the second task.
def diffcount(A):B=A.copy()B.sort(axis=0)C=np.diff(B,axis=0)>0D=C.sum(axis=0)+1return D# [[1 3]
# [2 1]]
it's seems to be a little faster on big arrays:
In [62]: A=np.random.randint(0,100,(100,100,100))In [63]: %timeit diffcount(A)
46.8 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)In [64]: timeit [[len(np.unique(A[:, i, j])) for j in range(A.shape[2])]\
for i in range(A.shape[1])]
149 ms ± 700 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Finally counting unique is simpler than sorting, a ln(A.shape[0])
factor can be win.
A way to win this factor is to use the set mechanism :
In [81]: %timeit np.apply_along_axis(lambda a:len(set(a)),axis=0,A)
183 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Unfortunately, this is not faster.
Another way is to do it by hand :
def countunique(A,Amax):res=np.empty(A.shape[1:],A.dtype)c=np.empty(Amax+1,A.dtype)for i in range(A.shape[1]):for j in range(A.shape[2]):T=A[:,i,j]for k in range(c.size): c[k]=0 for x in T:c[x]=1res[i,j]= c.sum()return res
At python level:
In [70]: %timeit countunique(A,100)
429 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Which is not so bad for a pure python approach. Then just shift this code at low level with numba :
import numba
countunique2=numba.jit(countunique) In [71]: %timeit countunique2(A,100)
3.63 ms ± 70.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Which will be difficult to improve a lot.