Regex match back to a period or start of string

2024/11/13 9:38:21

I'd like to match a word, then get everything before it up to the first occurance of a period or the start of the string.

For example, given this string and searching for the word "regex":

s = 'Do not match this. Or this. Or this either. I like regex. It is hard, but regex is also rewarding.'

It should return:

>> I like regex.
>> It is hard, but regex is also rewarding.

I'm trying to get my head around look-aheads and look-behinds, but (it seems) you can't easily look back until you hit something, only if it's immediately next to your pattern. I can get pretty close with this:

pattern = re.compile(r'(?:(?<=\.)|(?<=^))(.*?regex.*?\.)')

But it gives me the first period, then everything up to "regex":

>> Do not match this. Or this. Or this either. I like regex.  # no!
>> It is hard, but regex is also rewarding.                   # correct
Answer

You don't need to use lookarounds to do that. The negated character class is your best friend:

(?:[^\s.][^.]*)?regex[^.]*\.?

or

[^.]*regex[^.]*\.?

this way you take any characters before the word "regex" and forbids any of these characters to be a dot.

The first pattern stripes white-spaces on the left, the second one is more basic.

About your pattern:

Don't forget that a regex engine tries to succeed at each position from the left to the right of the string. That's why something like (?:(?<=\.)|(?<=^)).*?regex doesn't always return the shortest substring between a dot or the start of the string and the word "regex", even if you use a non-greedy quantifier. The leftmost position always wins and a non-greedy quantifier takes characters until the next subpattern succeeds.

As an aside, one more time, the negated character class can be useful:
to shorten (?:(?<=\.)|(?<=^)) you can write (?<![^.])

https://en.xdnf.cn/q/72174.html

Related Q&A

Finding differences between strings

I have the following function that gets a source and a modified strings, and bolds the changed words in it.def appendBoldChanges(s1, s2):"Adds <b></b> tags to words that are changed&qu…

Python pandas: select 2nd smallest value in groupby

I have an example DataFrame like the following:import pandas as pd import numpy as np df = pd.DataFrame({ID:[1,2,2,2,3,3,], date:array([2000-01-01,2002-01-01,2010-01-01,2003-01-01,2004-01-01,2008-01-01…

How to disable SSL3 and weak ciphers with cherrypy builtin ssl module (python 3)

I have configured Cherrypy 3.8.0 with Python 3 to use SSL/TLS. However, I want to disable SSL3 to avoid POODLE. I searched through the documentation but I am unsure on how to implement it.I am using th…

cleaning big data using python

I have to clean a input data file in python. Due to typo error, the datafield may have strings instead of numbers. I would like to identify all fields which are a string and fill these with NaN using p…

Using the Python shell in Vi mode on Windows

I know that you can use the Python shell in Vi mode on Unix-like operating systems. For example, I have this line in my ~/.inputrc:set editing-mode viThis lets me use Vi-style editing inside the Python…

Calculate residual deviance from scikit-learn logistic regression model

Is there any way to calculate residual deviance of a scikit-learn logistic regression model? This is a standard output from R model summaries, but I couldnt find it any of sklearns documentation.

Use Python to create 2D coordinate

I am truly a novice in Python. Now, I am doing a project which involves creating a list of 2D coordinates. The coordinates should be uniformly placed, using a square grid (10*10), like(0,0)(0,1)(0,2)(0…

How to pass Unicode title to matplotlib?

Cant get the titles right in matplotlib: technologien in C gives: technologien in CPossible solutions already tried:utechnologien in C doesnt work neither does: # -*- coding: utf-8 -*- at the beginnin…

Cythonize but not compile .pyx files using setup.py

I have a Cython project containing several .pyx files. To distribute my project I would like to provide my generated .c files as recommended in the Cython documentation, to minimize problems with diffe…

How to clear matplotlib labels in legend?

Is there a way to clear matplotlib labels inside a graphs legend? This post explains how to remove the legend itself, but the labels themselves still remain, and appear again if you plot a new figure.…