I'd like to match a word, then get everything before it up to the first occurance of a period or the start of the string.
For example, given this string and searching for the word "regex":
s = 'Do not match this. Or this. Or this either. I like regex. It is hard, but regex is also rewarding.'
It should return:
>> I like regex.
>> It is hard, but regex is also rewarding.
I'm trying to get my head around look-aheads and look-behinds, but (it seems) you can't easily look back until you hit something, only if it's immediately next to your pattern. I can get pretty close with this:
pattern = re.compile(r'(?:(?<=\.)|(?<=^))(.*?regex.*?\.)')
But it gives me the first period, then everything up to "regex":
>> Do not match this. Or this. Or this either. I like regex. # no!
>> It is hard, but regex is also rewarding. # correct
You don't need to use lookarounds to do that. The negated character class is your best friend:
(?:[^\s.][^.]*)?regex[^.]*\.?
or
[^.]*regex[^.]*\.?
this way you take any characters before the word "regex" and forbids any of these characters to be a dot.
The first pattern stripes white-spaces on the left, the second one is more basic.
About your pattern:
Don't forget that a regex engine tries to succeed at each position from the left to the right of the string. That's why something like (?:(?<=\.)|(?<=^)).*?regex
doesn't always return the shortest substring between a dot or the start of the string and the word "regex", even if you use a non-greedy quantifier. The leftmost position always wins and a non-greedy quantifier takes characters until the next subpattern succeeds.
As an aside, one more time, the negated character class can be useful:
to shorten (?:(?<=\.)|(?<=^))
you can write (?<![^.])