Comparing numpy array with itself by element efficiently

2024/9/22 4:07:53

I am performing a large number of these calculations:

A == A[np.newaxis].T

where A is a dense numpy array which frequently has common values.

For benchmarking purposes we can use:

n = 30000
A = np.random.randint(0, 1000, n)
A == A[np.newaxis].T

When I perform this calculation, I run into memory issues. I believe this is because the output isn't in more efficient bitarray or np.packedbits format. A secondary concern is we are performing twice as many comparisons as necessary, since the resulting Boolean array is symmetric.

The questions I have are:

  1. Is it possible to produce the Boolean numpy array output in a more memory efficient fashion without sacrificing speed? The options I know about are bitarray and np.packedbits, but I only know how to apply these after the large Boolean array is created.
  2. Can we utilise the symmetry of our calculation to halve the number of comparisons processed, again without sacrificing speed?

I will need to be able to perform & and | operations on Boolean arrays output. I have tried bitarray, which is super-fast for these bitwise operations. But it is slow to pack np.ndarray -> bitarray and then unpack bitarray -> np.ndarray.

[Edited to provide clarification.]

Answer

Here's one with numba to give us a NumPy boolean array as output -

from numba import njit@njit
def numba_app1(idx, n, s, out):for i,j in zip(idx[:-1],idx[1:]):s0 = s[i:j]c = 0for p1 in s0[c:]:for p2 in s0[c+1:]:out[p1,p2] = 1out[p2,p1] = 1c += 1return outdef app1(A):s = A.argsort()b = A[s]n = len(A)idx = np.flatnonzero(np.r_[True,b[1:] != b[:-1],True])out = np.zeros((n,n),dtype=bool)numba_app1(idx, n, s, out)out.ravel()[::out.shape[1]+1] = 1return out

Timings -

In [287]: np.random.seed(0)...: n = 30000...: A = np.random.randint(0, 1000, n)# Original soln
In [288]: %timeit A == A[np.newaxis].T
1 loop, best of 3: 317 ms per loop# @Daniel F's soln-1 that skips assigning lower diagonal in output
In [289]: %timeit sparse_outer_eq(A)
1 loop, best of 3: 450 ms per loop# @Daniel F's soln-2 (complete one)
In [291]: %timeit sparse_outer_eq(A)
1 loop, best of 3: 634 ms per loop# Solution from this post
In [292]: %timeit app1(A)
10 loops, best of 3: 66.9 ms per loop
https://en.xdnf.cn/q/71987.html

Related Q&A

Kivy: BoxLayout vs. GridLayout

BoxLayout(orientation=vertical) vs. GridLayout(cols=1):They both do the same thing, no? Is there a reason to choose one over the other?

Flask circular dependency

I am developing a Flask application. It is still relatively small. I had only one app.py file, but because I needed to do database migrations, I divided it into 3 using this guide:https://realpython.co…

How to create tox.ini variables

Is there a way to set arbitrary variables within tox.ini?An example would be a project name that might be used in a variety of ways. With a rather complex tox.ini, I find myself copy and pasting all …

How to apply json_normalize on entire pandas column

I have a dataframe with LISTS(with dicts) as column values . My intention is to normalize entire column(all rows). I found way to normalize a single row . However, Im unable to apply the same function …

Configure Vs code version 2.0.0 Build Task for python

I need help in configuring my Vs code to run scripts in python using Cntrl Shift B, I was working fine until Vs code upgraded to version 2.0.0 now it wants me to configure the Build. And I am clueless…

Generate N-Grams from strings with pandas

I have a DataFrame df like this: Pattern String 101 hi, how are you? 104 what are you doing? 108 Python is good to learn.I want to crea…

Merge dataframes on multiple columns with fuzzy match in Python

I have two example dataframes as follows:df1 = pd.DataFrame({Name: {0: John, 1: Bob, 2: Shiela}, Degree: {0: Masters, 1: Graduate, 2: Graduate}, Age: {0: 27, 1: 23, 2: 21}}) df2 = pd.DataFrame({Name: {…

Prevent Celery Beat from running the same task

I have a scheduled celery running tasks every 30 seconds. I have one that runs as task daily, and another one that runs weekly on a user specified time and day of the week. It checks for the "star…

Tastypie with application/x-www-form-urlencoded

Im having a bit of difficulty figuring out what my next steps should be. I am using tastypie to create an API for my web application. From another application, specifically ifbyphone.com, I am receivin…

Check for areas that are too thin in an image

I am trying to validate black and white images (more of a clipart images - not photos) for an engraving machine. One of the major things I need to take into consideration is the size of areas (or width…