How to determine if two rows are identical (similar) if row 2 contains part of the info from row 1?

2024/10/5 17:19:10

Hope you are having a good day. I am currently working with an extremely dirty dataframe containing First Name, Last Name, and Middle Name. One the issues that I am trying to resolve looks like below:

First Name Last Name
James Agnew Bond
James Bond

Another similar issue that I am trying to resolve looks like follows:

First Name Last Name
Jam Bond
James Bond

Looking forward to your ideas.

Thanks!

Edit: FYI, to make life simpler, I already have data grouped by address which is unique. So, two rows will have one address, another two or three rows will have another address, and so on.

Answer

This is a not so simple problem. To check if 2 strings are 'similar' you must enter in non-Euclidean distance algorithm. I mean, you must define a similarity function and 'understand' the distance between string.

jellyfish is a library born to solve these problems

Another approach is to collect all names and bind them to a thesaurus of names like this

With a some search, I've found this

hope can help

https://en.xdnf.cn/q/119551.html

Related Q&A

Cartopy fancy box

Hello I have been trying to plot data in a Orthographic projection. The data is plotted but I want the box to follow the data limits. Like in this example I am sharing form M_map[enter image descriptio…

discord.py - No DM sent to the user

I am making a discord.Client. I have a DM command that sends a DM to a specific user, but no message is sent to the user when the command is run, but a message is sent on the Context.channel. Here is m…

Improve CPU time of conditional statement

I have written an if-elif statement, which I believe not be very efficient:first_number = 1000 second_number = 700 switch = {upperRight: False,upperLeft: False,lowerRight: False,lowerLeft: False,middle…

Why no colon in forming a list from loop in one line in Python?

From this website, there is a way to form a list in Python from loop in one line squares = [i**2 for i in range(10)]My question is, typically, after a loop, there is a colon, e.g., squares = [] for i i…

Merge each groups rows into one row

Im experienced with Pandas but stumbled upon a problem that I cant seem to figure out. I have a large dataset ((40,000, 16)) and I am trying to group it by a specific column ("group_name" for…

Python decode unknown character

Im trying to decode the following: UKLTD� For into utf-8 (or anything really) but I cannot workout how to do it and keep getting errors likeascii codec cant decode byte 0xae in position 8: ordinal not…

UnboundLocalError: TRY EXCEPT STATEMENTS

I am currently creating a menu with try except tools. Im trying to create it so if a user enters nothing (presses ENTER) to output:You have not entered anything, please enter a number between 1 and 4Th…

Cant load music into pygame

please help if you can. Cant seem to be able to upload music into my game in progress. It comes up with the error of "cant load"... Would be great if someone got back to me quick, This is a m…

C# Socket: how to keep it open?

I am creating a simple server (C#) and client (python) that communicate using sockets. The server create a var listener = new Socket(AddressFamily.InterNetwork,SocketType.Stream, ProtocolType.Tcp)then …

Python Selenium - how to get confirmation after submit

I have a follow up question on this post, I want to get any confirmation text after I hit submit button. Either the code works or not. html - invalid example <div class="serialModalArea js-seri…