Distance matrix in Python Pandas

2024/11/13 9:25:11

I am a newbie in python, but I like to process data in pandas.

I have a hundred pairs of CSV data such as passenger and bus stop data. The passenger structure data is Person, and XY coordinates (UTM-Meter). The bus stop data structure is the bus stop name (BS), and XY coordinates (UTM-Meter. My data looks like this:

df(person)
Person  X           Y
A_first 260357.3199 4064458.685
A_last  261618.5243 4064789.928
B_first 258270.5926 4063717.789
B_last  258270.5926 4063717.789
C_first 259051.758  4064462.021
C_last  258270.5926 4063717.789
D_first 260764.3916 4064624.977
D_last  260753.9053 4064569.745
E_first 258270.5926 4063717.789
E_last  258270.5926 4063717.789
F_first 258262.2825 4063740.234
F_last  258270.5926 4063717.789
G_first 258326.2393 4064104.899
G_last  258270.5926 4063717.789
H_first 259491.5911 4064838.328
H_last  260747.1408 4064647.669

and,

df(bus stop)
BS  X           Y 
BS1 258280.8228 4063715.835
BS2 259961.3734 4064840.298
BS3 260360.0219 4064468.593
BS4 260530.5126 4064683.101
BS5 260806.7009 4064399.184
BS6 261178.4165 4063605.91
BS7 261222.1745 4064495.158
BS8 261627.6173 4064281.38
BS9 261662.8833 4064793.444

I want to analyze the distance matrix between person and bus stop data. My desire output is InputID (Person), TargetID (bus stop) and distance. Distance is the distance between person point data and bus stop point data. My expected result:

result
InputID (Person)    TargetID (BS)   Distance (meter)
A_first                 BS1                 xx
A_first                 BS2                 xx
A_first                 BS3                 xx
A_first                 BS4                 xx
A_first                 BS5                 xx
A_first                 BS6                 xx
A_first                 BS7                 xx
A_first                 BS8                 xx
A_first                 BS9                 xx
A_last                  BS1                 xx
A_last                  BS2                 xx
A_last                  BS3                 xx
A_last                  BS4                 xx
A_last                  BS5                 xx
A_last                  BS6                 xx
A_last                  BS7                 xx
A_last                  BS8                 xx
A_last                  BS9                 xx
............. so on

Any advice for me? thank you

Answer

You can first create an outer join using

import pandas as pd
import numpy as npdf1 = pd.DataFrame({'Person': ['A_first', 'A_last', 'B_first', 'B_last'],'X': [260357.3199, 261618.5243, 258270.5926, 258270.5926],'Y': [4064458.685, 4064789.928, 4063717.789, 4063717.789],
})df2 = pd.DataFrame({'BS': ['BS1', 'BS2'],'X': [258280.8228, 259961.3734],'Y': [4063715.835, 4064840.298],
})df1['key'] = 0
df2['key'] = 0df_cartesian = df1.merge(df2, on='key').drop(columns=['key'])

Then calculate the row-wise X and Y difference, and take the L2 norm:

df_cartesian['X_diff'] = df_cartesian['X_x'] - df_cartesian['X_y']
df_cartesian['Y_diff'] = df_cartesian['Y_x'] - df_cartesian['Y_y']df_cartesian['dist'] = np.linalg.norm(df_cartesian[['X_diff', 'Y_diff']], axis=1)df_cartesian
#     Person          X_x          Y_x   BS          X_y          Y_y  \
# 0  A_first  260357.3199  4064458.685  BS1  258280.8228  4063715.835   
# 1  A_first  260357.3199  4064458.685  BS2  259961.3734  4064840.298   
# 2   A_last  261618.5243  4064789.928  BS1  258280.8228  4063715.835   
# 3   A_last  261618.5243  4064789.928  BS2  259961.3734  4064840.298   
# 4  B_first  258270.5926  4063717.789  BS1  258280.8228  4063715.835   
# 5  B_first  258270.5926  4063717.789  BS2  259961.3734  4064840.298   
# 6   B_last  258270.5926  4063717.789  BS1  258280.8228  4063715.835   
# 7   B_last  258270.5926  4063717.789  BS2  259961.3734  4064840.298   
# 
#       X_diff    Y_diff         dist  
# 0  2076.4971   742.850  2205.372152  
# 1   395.9465  -381.613   549.911004  
# 2  3337.7015  1074.093  3506.269681  
# 3  1657.1509   -50.370  1657.916235  
# 4   -10.2302     1.954    10.415138  
# 5 -1690.7808 -1122.509  2029.474358  
# 6   -10.2302     1.954    10.415138  
# 7 -1690.7808 -1122.509  2029.474358  
https://en.xdnf.cn/q/119724.html

Related Q&A

calculating catalan numbers using memoization

I am tring to use memoization in order to calculate catalan numbers, but it just does not seem to work, what do I need to change?def catalan_mem(n, memo = None):if n==0:return 1if memo == None:memo = …

cron python file doesnt work on centos 7

im trying to schedule my pythohn script into Centos 7 with cron. On my script at start i have added this:#!/usr/local/bin/pythonand this is my cron file that i have create into folder that contain pyth…

Comma separated Matrix from txt files - continued

I need to form a matrix from a list of textfiles containing frequency distribution of expressions. Therefore, I created a list of all that text files (lof) from a directory and used it to build a matri…

Questions about training LLMs on large text datasets for text generation from scratch

I made a fully custom made GPT in Jax (with Keras 3), using Tensorflow for the data pipeline. Ive trained the model on the Shakespeare dataset and got good results (so no problem with the model). Now I…

Pandas - Update/Merge 2 Dataframes based on multiple matching column values

I have 2 dataframes left_df and right-df, which both have 20 columns with identical names and dtypes. right_df also has 2 additional columns with unique values on every row. I want to update rows in ri…

How do I fix scrapy Unsupported URL scheme error?

I collect url from command python and then insert it into start_urls from flask import Flask, jsonify, request import scrapy import subprocessclass ClassSpider(scrapy.Spider):name = mySpider#sta…

comparing two timeseries dataframes based on some conditions in pandas

I have two timeseries dataframes df1 and df2: df1 = pd.DataFrame({date_1:[10/11/2017 0:00,10/11/2017 03:00,10/11/2017 06:00,10/11/2017 09:00],value_1:[5000,1500,np.nan,2000]})df1[date_1] = pd.to_dateti…

Game of Chance in Python 3.x?

I have this problem in my python code which is a coinflip game, the problem is that when It asks, "Heads or Tails?" and I just say 1 or Heads(same for 2 and Tails) without quotation marks an…

Count occurence of a word by ID in python

Following is the content of a file,My question is how to count the number of occurences for the word "optimus" for different IDs ID67 DATEUID Thank you for choosing Optimus prime. Please w…

ModuleNotFoundError: No module named plyer in Python

I am trying to write a program notify.py (location: desktop) that uses plyer library to get a notification on windows 10. I used pip install plyer and am using vs code to run the program but I get an e…