Why does scipy linear interpolation run faster than nearest neighbor interpolation?

2024/10/15 3:10:09

I've written a routine that interpolates point data onto a regular grid. However, I find that scipy's implementation of nearest neighbor interpolation performs almost twice as slow as the radial basis function I'm using for linear interpolation (scipy.interpolate.Rbf)

Relevant code includes how the interpolators are constructed

if interpolation_mode == 'linear':interpolator = scipy.interpolate.Rbf(point_array[:, 0], point_array[:, 1], value_array,function='linear', smooth=.01)
elif interpolation_mode == 'nearest':interpolator = scipy.interpolate.NearestNDInterpolator(point_array, value_array)

And when the interpolation is called

result = interpolator(col_coords.ravel(), row_coords.ravel())

The sample I'm running on has 27 input interpolant value points and I'm interpolating across nearly a 20000 X 20000 grid. (I'm doing this in memory block sizes so I'm not exploding the computer btw.)

Below are the result of two cProfiles I've run on the relevant code. Note that the nearest neighbor scheme runs in 406 seconds while the linear scheme runs in 256 seconds. The nearest scheme is dominated by calls to scipy's kdTree, which seems reasonable, except the rbf outperforms it by a significant amount of time. Any ideas why or what I could do to make my nearest scheme run faster than linear?

Linear run:

     25362 function calls in 225.886 secondsOrdered by: internal timeList reduced from 328 to 10 due to restriction <10>ncalls  tottime  percall  cumtime  percall filename:lineno(function)253  169.302    0.669  207.516    0.820 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:112(
_euclidean_norm)258   38.211    0.148   38.211    0.148 {method 'reduce' of 'numpy.ufunc' objects}252    6.069    0.024    6.069    0.024 {numpy.core._dotblas.dot}1    5.077    5.077  225.332  225.332 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2
8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)252    1.849    0.007    2.137    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:32
85(meshgrid)507    1.419    0.003    1.419    0.003 {method 'flatten' of 'numpy.ndarray' objects}1268    1.368    0.001    1.368    0.001 {numpy.core.multiarray.array}252    1.018    0.004    1.018    0.004 {_gdal_array.BandRasterIONumPy}1    0.533    0.533  225.886  225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate)252    0.336    0.001  216.716    0.860 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:225(
__call__)

Nearest neighbor run:

     27539 function calls in 405.624 secondsOrdered by: internal timeList reduced from 309 to 10 due to restriction <10>ncalls  tottime  percall  cumtime  percall filename:lineno(function)252  397.806    1.579  397.822    1.579 {method 'query' of 'ckdtree.cKDTree' objects}252    1.875    0.007    1.881    0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays}252    1.831    0.007    2.101    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid)252    1.034    0.004  400.739    1.590 C:\Python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__)1    1.009    1.009  405.030  405.030 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)252    0.719    0.003    0.719    0.003 {_gdal_array.BandRasterIONumPy}1    0.509    0.509  405.624  405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate)252    0.261    0.001    0.261    0.001 {numpy.core.multiarray.copyto}27    0.125    0.005    0.125    0.005 {_ogr.Layer_CreateFeature}1    0.116    0.116    0.254    0.254 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data)

For reference, I'm also including the visual result of these two test cases.

Nearest

Nearest

Linear

Linear

Answer

Running the example in the griddata doc:

In [47]: def func(x, y):return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2....: 
In [48]: points = np.random.rand(1000, 2)
In [49]: values = func(points[:,0], points[:,1])
In [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]

So we have 1000 scattered points, and will interpolate at 20,000.

In [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y),method='nearest')
10 loops, best of 3: 83.6 ms per loopIn [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y),method='linear')
1 loops, best of 3: 24.6 ms per loopIn [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y), method='cubic')
10 loops, best of 3: 42.7 ms per loop

and for the 2 stage interpolators:

In [55]: %%timeit 
rbfi = interpolate.Rbf(points[:,0],points[:,1],values)
dl = rbfi(grid_x.ravel(),grid_y.ravel())....: 
1 loops, best of 3: 3.89 s per loopIn [56]: %%timeit 
ndi=interpolate.NearestNDInterpolator(points, values)
dl=ndi(grid_x.ravel(),grid_y.ravel())....: 
10 loops, best of 3: 82.6 ms per loopIn [57]: %%timeit 
ldi=interpolate.LinearNDInterpolator(points, values)
dl=ldi(grid_x.ravel(),grid_y.ravel())....
10 loops, best of 3: 25.1 ms per loop

griddata is actually a 1 step cover call for these last two versions.

griddata describes its methods as:

nearest
return the value at the data point closest to the point ofinterpolation. See NearestNDInterpolator for more details.Uses scipy.spatial.cKDTreelinear
tesselate the input point set to n-dimensional simplices, and interpolate linearly on each simplex. LinearNDInterpolator details are:The interpolant is constructed by triangulating the input data with Qhull [R37], and on each triangle performing linear barycentric interpolation.cubic (2-D)
return the value determined from a piecewise cubic, continuously differentiable (C1), and approximately curvature-minimizing polynomial surface. See CloughTocher2DInterpolator for more details.

Further tests on the 2 stage versions shows that setting up the nearest cKTtree is very fast; most of the time is spent in the 2nd interpolation state.

On the other hand, setting up the triangulated surface takes longer than the linear interpolation.

I don't know enough of the Rbf method to say why that is so much slower. The underlying methods are so different that intuitions developed with simple manual methods of interpolation don't mean much.

Your example starts with fewer scattered points, and interpolates on a much finer grid.

https://en.xdnf.cn/q/69335.html

Related Q&A

How do I create a 404 page?

My application catches all url requests with an @app.route, but occasionally I bump into a bad url for which I have no matching jinja file (bu it does match an existing @app.route). So I want to redire…

Injecting pre-trained word2vec vectors into TensorFlow seq2seq

I was trying to inject pretrained word2vec vectors into existing tensorflow seq2seq model.Following this answer, I produced the following code. But it doesnt seem to improve performance as it should, a…

MySQL Stored Procedures, Pandas, and Use multi=True when executing multiple statements

Note - as MaxU suggested below, the problem is specific to mysql.connector and does not occur if you use pymysql. Hope this saves someone else some headachesUsing Python, Pandas, and mySQL and cannot…

How can I change the font size in GTK?

Is there an easy way to change the font size of text elements in GTK? Right now the best I can do is do set_markup on a label, with something silly like:lbl.set_markup("<span font_desc=Tahoma …

How to read BigQuery table using python pipeline code in GCP Dataflow

Could someone please share syntax to read/write bigquery table in a pipeline written in python for GCP Dataflow

How can I wrap a python function in a way that works with with inspect.signature?

Some uncontroversial background experimentation up front: import inspectdef func(foo, bar):passprint(inspect.signature(func)) # Prints "(foo, bar)" like youd expectdef decorator(fn):def _wra…

Python OpenCV Error: TypeError: Image data cannot be converted to float

So I am trying to create a Python Program to detect similar details in two images using Pythons OpenCV. I have the two images and they are in my current directory, and they exist (see the code in line…

Specify timestamp on each packet in Scapy?

With Scapy, when I create a packet and write it to a pcap file, it sets the timestamp of the packet to the current time.This is my current usage. 1335494712.991895 being the time I created the packet:&…

Converting a dataframe to dictionary with multiple values

I have a dataframe likeSr.No ID A B C D1 Tom Earth English BMW2 Tom Mars Spanish BMW Green 3 Michael Mercury Hindi …

How do I create KeyPoints to compute SIFT?

I am using OpenCV-Python.I have identified corner points using cv2.cornerHarris. The output is of type dst.I need to compute SIFT features of the corner points. The input to sift.compute() has to be of…