Given two python lists of same length. How to return the best matches of similar values?

2024/10/2 1:36:54

Given are two python lists with strings in them (names of persons):

list_1 = ['J. Payne', 'George Bush', 'Billy Idol', 'M Stuart', 'Luc van den Bergen']
list_2 = ['John Payne', 'George W. Bush', 'Billy Idol', 'M. Stuart', 'Luc Bergen']

I want a mapping of the names, that are most similar.

'J. Payne'           -> 'John Payne'
'George Bush'        -> 'George W. Bush'
'Billy Idol'         -> 'Billy Idol'
'M Stuart'           -> 'M. Stuart'
'Luc van den Bergen' -> 'Luc Bergen'

Is there a neat way to do this in python? The lists contain in average 5 or 6 Names. Sometimes more, but this is seldom. Sometimes it is just one name in every list, which could be spelled slightly different.

Answer

Using the function defined here: http://hetland.org/coding/python/levenshtein.py

>>> for i in list_1:
...     print i, '==>', min(list_2, key=lambda j:levenshtein(i,j))
... 
J. Payne ==> John Payne
George Bush ==> George W. Bush
Billy Idol ==> Billy Idol
M Stuart ==> M. Stuart
Luc van den Bergen ==> Luc Bergen

You could use functools.partial instead of the lambda

>>> from functools import partial
>>> for i in list_1:
...     print i, '==>', min(list_2, key=partial(levenshtein,i))
...
J. Payne ==> John Payne
George Bush ==> George W. Bush
Billy Idol ==> Billy Idol
M Stuart ==> M. Stuart
Luc van den Bergen ==> Luc Bergen
https://en.xdnf.cn/q/70905.html

Related Q&A

Extracting Javascript gettext messages using Babel CLI extractor

It is stated here that Babel can extract gettext messages for Python and Javascript files.Babel comes with a few builtin extractors: python (which extractsmessages from Python source files), javascript…

Getting TTFB (time till first byte) for an HTTP Request

Here is a python script that loads a url and captures response time:import urllib2 import timeopener = urllib2.build_opener() request = urllib2.Request(http://example.com)start = time.time() resp = ope…

accessing kubernetes python api through a pod

so I need to connect to the python kubernetes client through a pod. Ive been trying to use config.load_incluster_config(), basically following the example from here. However its throwing these errors. …

Understanding DictVectorizer in scikit-learn?

Im exploring the different feature extraction classes that scikit-learn provides. Reading the documentation I did not understand very well what DictVectorizer can be used for? Other questions come to …

Parsing RSS with Elementtree in Python

How do you search for namespace-specific tags in XML using Elementtree in Python?I have an XML/RSS document like:<?xml version="1.0" encoding="UTF-8"?> <rss version=&quo…

String module object has no attribute join

So, I want to create a user text input box in Pygame, and I was told to look at a class module called inputbox. So I downloaded inputbox.py and imported into my main game file. I then ran a function in…

TypeError: the JSON object must be str, not Response with Python 3.4

Im getting this error and I cant figure out what the problem is:Traceback (most recent call last):File "C:/Python34/Scripts/ddg.py", line 8, in <module>data = json.loads(r)File "C:…

Redirect while passing message in django

Im trying to run a redirect after I check to see if the user_settings exist for a user (if they dont exist - the user is taken to the form to input and save them).I want to redirect the user to the app…

Django sorting by date(day)

I want to sort models by day first and then by score, meaning Id like to see the the highest scoring Articles in each day. class Article(models.Model):date_modified = models.DateTimeField(blank=True, n…

ImportError: No module named pynotify. While the module is installed

So this error keeps coming back.Everytime I try to tun the script it returns saying:Traceback (most recent call last):File "cli.py", line 11, in <module>import pynotify ImportError: No …