Fast fuse of close points in a numpy-2d (vectorized)

2024/9/21 22:38:41

I have a question similar to the question asked here: simple way of fusing a few close points. I want to replace points that are located close to each other with the average of their coordinates. The closeness in cells is specified by the user (I am talking about euclidean distance).

In my case I have a lot of points (about 1-million). This method is working, but is time consuming as it uses a double for loop.

Is there a faster way to detect and fuse close points in a numpy 2d array?


To be complete I added an example:

points=array([[  382.49056159,   640.1731949 ],[  496.44669161,   655.8583119 ],[ 1255.64762859,   672.99699399],[ 1070.16520917,   688.33538171],[  318.89390168,   718.05989421],[  259.7106383 ,   822.2       ],[  141.52574427,    28.68594436],[ 1061.13573287,    28.7094536 ],[  820.57417943,    84.27702407],[  806.71416007,   108.50307828]])

A scatterplot of the points is visible below. The red circle indicates the points located close to each other (in this case a distance of 27.91 between the last two points in the array). So if the user would specify a minimum distance of 30 these points should be fused.

enter image description here

In the output of the fuse function the last to points are fused. This will look like:

#output
array([[  382.49056159,   640.1731949 ],[  496.44669161,   655.8583119 ],[ 1255.64762859,   672.99699399],[ 1070.16520917,   688.33538171],[  318.89390168,   718.05989421],[  259.7106383 ,   822.2       ],[  141.52574427,    28.68594436],[ 1061.13573287,    28.7094536 ],[  813.64416975,    96.390051175]])
Answer

If you have a large number of points then it may be faster to build a k-D tree using scipy.spatial.KDTree, then query it for pairs of points that are closer than some threshold:

import numpy as np
from scipy.spatial import KDTreetree = KDTree(points)
rows_to_fuse = tree.query_pairs(r=30)    print(repr(rows_to_fuse))
# {(8, 9)}print(repr(points[list(rows_to_fuse)]))
# array([[ 820.57417943,   84.27702407],
#        [ 806.71416007,  108.50307828]])

The major advantage of this approach is that you don't need to compute the distance between every pair of points in your dataset.

https://en.xdnf.cn/q/72014.html

Related Q&A

I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

While trying to use to_gbq for updating Google BigQuery table, I get a response of:GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, givi…

Something wrong with Keras code Q-learning OpenAI gym FrozenLake

Maybe my question will seem stupid.Im studying the Q-learning algorithm. In order to better understand it, Im trying to remake the Tenzorflow code of this FrozenLake example into the Keras code.My code…

How to generate month names as list in Python? [duplicate]

This question already has answers here:Get month name from number(18 answers)Closed 2 years ago.I have tried using this but the output is not as desired m = [] import calendar for i in range(1, 13):m.a…

Getting ERROR: Double requirement given: setuptools error in zappa

I tried to deploy my Flask app with zappa==0.52.0, but I get an error as below;ERROR: Double requirement given: setuptools (already in setuptools==52.0.0.post20210125, name=setuptools) WARNING: You are…

PySpark - Create DataFrame from Numpy Matrix

I have a numpy matrix:arr = np.array([[2,3], [2,8], [2,3],[4,5]])I need to create a PySpark Dataframe from arr. I can not manually input the values because the length/values of arr will be changing dyn…

RunTimeError during one hot encoding

I have a dataset where class values go from -2 to 2 by 1 step (i.e., -2,-1,0,1,2) and where 9 identifies the unlabelled data. Using one hot encode self._one_hot_encode(labels)I get the following error:…

Is there a Mercurial or Git version control plugin for PyScripter? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

How to make a color map with many unique colors in seaborn

I want to make a colormap with many (in the order of hundreds) unique colors. This code: custom_palette = sns.color_palette("Paired", 12) sns.palplot(custom_palette)returns a palplot with 12 …

Swap column values based on a condition in pandas

I would like to relocate columns by condition. In case country is Japan, I need to relocate last_name and first_name reverse.df = pd.DataFrame([[France,Kylian, Mbappe],[Japan,Hiroyuki, Tajima],[Japan,…

How to improve performance on a lambda function on a massive dataframe

I have a df with over hundreds of millions of rows.latitude longitude time VAL 0 -39.20000076293945312500 140.80000305175781250000 1…