How can I find the best fuzzy string match?

2024/9/8 10:39:17

Python's new regex module supports fuzzy string matching. Sing praises aloud (now).

Per the docs:

The ENHANCEMATCH flag makes fuzzy matching attempt to improve the fitof the next match that it finds.

The BESTMATCH flag makes fuzzy matching search for the best matchinstead of the next match

The ENHANCEMATCH flag is set using (?e) as in

regex.search("(?e)(dog){e<=1}", "cat and dog")[1] returns "dog"

but there's nothing on actually setting the BESTMATCH flag. How's it done?

Answer

Documentation on the BESTMATCH flag functionality is partial (but improving). Poke-n-hope shows that BESTMATCH is set using (?b).

>>> import regex
>>> regex.search(r"(?e)(?:hello){e<=4}", "What did you say, oh - hello")[0]
'hat d'
>>> regex.search(r"(?b)(?:hello){e<=4}", "What did you say, oh - hello")[0]
'hello'
https://en.xdnf.cn/q/72876.html

Related Q&A

how to write a unicode csv in Python 2.7

I want to write data to files where a row from a CSV should look like this list (directly from the Python console):row = [\xef\xbb\xbft_11651497, http://kozbeszerzes.ceu.hu/entity/t/11651497.xml, "…

Terminating QThread gracefully on QDialog reject()

I have a QDialog which creates a QThread to do some work while keeping the UI responsive, based on the structure given here: How To Really, Truly Use QThreads; The Full Explanation. However, if reject(…

Python descriptors with old-style classes

I tried to google something about it. Why do non-data descriptors work with old-style classes?Docs say that they should not: "Note that descriptors are only invoked for new style objects or class…

Decrypting a file to a stream and reading the stream into pandas (hdf or stata)

Overview of what Im trying to do. I have encrypted versions of files that I need to read into pandas. For a couple of reasons it is much better to decrypt into a stream rather than a file, so thats m…

How to replace accents in a column of a pandas dataframe

I have a dataframe dataSwiss which contains the information Swiss municipalities. I want to replace the letter with accents with normal letter.This is what I am doing:dataSwiss[Municipality] = dataSwis…

Comparison of multi-threading models in Julia =1.3 and Python 3.x

I would like to understand, from the user point of view, the differences in multithreading programming models between Julia >= 1.3 and Python 3.Is there one that is more efficient than the other (in…

How to do multihop ssh with fabric

I have a nat and it has various server So from my local server I want to go to nat and then from nat i have to ssh to other machinesLocalNAT(abcuser@publicIP with key 1)server1(xyzuser@localIP with key…

Python - Converting CSV to Objects - Code Design

I have a small script were using to read in a CSV file containing employees, and perform some basic manipulations on that data.We read in the data (import_gd_dump), and create an Employees object, cont…

Python multithreading - memory not released when ran using While statement

I built a scraper (worker) launched XX times through multithreading (via Jupyter Notebook, python 2.7, anaconda). Script is of the following format, as described on python.org:def worker():while True:i…

Delete files that are older than 7 days

I have seen some posts to delete all the files (not folders) in a specific folder, but I simply dont understand them.I need to use a UNC path and delete all the files that are older than 7 days.Mypath …