In the pypi page of the awesome regex module (https://pypi.python.org/pypi/regex) it is stated that \G can be used "in negative variable-length lookbehinds to limit how far back the lookbehind goes". Very interesting, but the page doesn't give any example and my white-belt regex-fu simply chokes when I try to imagine one.
Could anyone describe some sample use case?
Here's an example that uses \G
and a negative lookbehind creatively:
regex.match(r'\b\w+\b(?:\s(\w+\b)(?<!\G.*\b\1\b.*\b\1\b))*', words)
words
should be a string of alphanumeric characters separated by a single whitespace, for example "a b c d e a b b c d"
.
The pattern will match a sequence of unique words.
\w+
- Match the first word.
(?:\s(\w+\b) )*
- match additional words ...
(?<!\G.*\b\1\b.*\b\1\b)
- ... but for each new word added, check it didn't already appear until we get to \G
.
A lookbehind at the end of the pattern that is limited at \G
can assert another condition on the current match, which would not have been possible otherwise. Basically, the pattern is a variation on using lookaheads for AND logic in regular expressions, but is not limited to the whole string.
Here's a working example in .Net, which shares the same features.
Trying the same pattern in Python 2 with findall
and the regex
module gives me a segmentation fault, but match
seems to work.