Example of use \G in negative variable-length lookbehinds to limit how far back the lookbehind goes

2024/9/30 3:36:37

In the pypi page of the awesome regex module (https://pypi.python.org/pypi/regex) it is stated that \G can be used "in negative variable-length lookbehinds to limit how far back the lookbehind goes". Very interesting, but the page doesn't give any example and my white-belt regex-fu simply chokes when I try to imagine one.

Could anyone describe some sample use case?

Answer

Here's an example that uses \G and a negative lookbehind creatively:

regex.match(r'\b\w+\b(?:\s(\w+\b)(?<!\G.*\b\1\b.*\b\1\b))*', words)

words should be a string of alphanumeric characters separated by a single whitespace, for example "a b c d e a b b c d".

The pattern will match a sequence of unique words.

  • \w+ - Match the first word.
  • (?:\s(\w+\b) )* - match additional words ...
  • (?<!\G.*\b\1\b.*\b\1\b) - ... but for each new word added, check it didn't already appear until we get to \G.

A lookbehind at the end of the pattern that is limited at \G can assert another condition on the current match, which would not have been possible otherwise. Basically, the pattern is a variation on using lookaheads for AND logic in regular expressions, but is not limited to the whole string.

Here's a working example in .Net, which shares the same features.
Trying the same pattern in Python 2 with findall and the regex module gives me a segmentation fault, but match seems to work.

https://en.xdnf.cn/q/71130.html

Related Q&A

Regex with lookbehind not working using re.match

The following python code:import reline="http://google.com" procLine = re.match(r(?<=http).*, line) if procLine.group() == "":print(line + ": did not match regex") els…

testing python multiprocessing pool code with nose

I am trying to write tests with nose that get set up with something calculated using multiprocessing.I have this directory structure:code/tests/tests.pytests.py looks like this:import multiprocessing a…

Python verify url goes to a page

I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and re…

Bokeh: Synchronizing hover tooltips in linked plots

I have two linked plots. When hovering, I would like to have a tooltip appear in both plots. I already use the linked selection with great success, but now I want to link the tooltips also.Below is an …

Pipe STDIN to a script that is itself being piped to the Python interpreter?

I need to implement an SVN pre-commit hook which executes a script that itself is stored in SVN.I can use the svn cat command to pipe that script to the Python interpreter, as follows:svn cat file://$R…

subprocess.call using cygwin instead of cmd on Windows

Im programming on Windows 7 and in one of my Python projects I need to call bedtools, which only works with Cygwin on Windows. Im new to Cygwin, installed the default version + everything needed for be…

Django Celery Received unregistered task of type appname.tasks.add

Following the documentation and the Demo Django project here https://github.com/celery/celery/tree/3.1/examples/djangoProject Structurepiesup2|piesup2| |__init__.py| |celery.py| |settings.py| |urls…

Documenting and detailing a single script based on the comments inside

I am going to write a set of scripts, each independent from the others but with some similarities. The structure will most likely be the same for all the scripts and probably looks like: # -*- coding: …

Using Ansible variables in testinfra

Using TestInfra with Ansible backend for testing purposes. Everything goes fine except using Ansible itself while running teststest.pyimport pytest def test_zabbix_agent_package(host):package = host.pa…

How to create a dictionary of dictionaries of dictionaries in Python

So I am taking a natural language processing class and I need to create a trigram language model to generate random text that looks "realistic" to a certain degree based off of some sample da…