I have two datasets (csv files). Both of them contains latitudes-longitudes of two sets (220 and 4400) of points. Now I want to measure pairwise distances (miles) between these two sets of points (220 x 4400). How can I do that in python? Similar to this problem: https://gist.github.com/rochacbruno/2883505
Best is to use sklearn
which has exactly what you ask for.
Say we have some sample data
towns = pd.DataFrame({"name" : ["Merry Hill", "Spring Valley", "Nesconset"],"lat" : [36.01, 41.32, 40.84],"long" : [-76.7, -89.20, -73.15]
})museum = pd.DataFrame({"name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],"lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],"long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})
You can use sklearn
distance metrics, which has the haversine implemented
from sklearn.neighbors import DistanceMetricdist = DistanceMetric.get_metric('haversine')
After you extract the numpy
array values with
places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values
you simply
EARTH_RADIUS = 6371.009haversine_distances = dist.pairwise(np.radians(places_gps), np.radians(museum_gps) )
haversine_distances *= EARTH_RADIUS
to get the distances in KM
. If you need miles, multiply with constant.
If you are only interested in the closest few, or all within radius, check out sklearn
BallTree algorithm which also has the haversine implemented. It is much faster.
Edit: To convert the output to a dataframe use for instance
pd_distances = pd.DataFrame(haversine_distances, columns=museum.name, index=towns.name, )
pd_distances