I am trying to do the following with a regular expression:
import re
x = re.compile('[^(going)|^(you)]') # words to replace
s = 'I am going home now, thank you.' # string to modify
print re.sub(x, '_', s)
The result I get is:
'_____going__o___no______n__you_'
The result I want is:
'_____going_________________you_'
Since the ^
can only be used inside brackets []
, this result makes sense, but I'm not sure how else to go about it.
I even tried '([^g][^o][^i][^n][^g])|([^y][^o][^u])'
but it yields '_g_h___y_'
.
Not quite as easy as it first appears, since there is no "not" in REs except ^
inside [ ]
which only matches one character (as you found). Here is my solution:
import redef subit(m):stuff, word = m.groups()return ("_" * len(stuff)) + words = 'I am going home now, thank you.' # string to modifyprint re.sub(r'(.+?)(going|you|$)', subit, s)
Gives:
_____going_________________you_
To explain. The RE itself (I always use raw strings) matches one or more of any character (.+
) but is non-greedy (?
). This is captured in the first parentheses group (the brackets). That is followed by either "going" or "you" or the end-of-line ($
).
subit
is a function (you can call it anything within reason) which is called for each substitution. A match object is passed, from which we can retrieve the captured groups. The first group we just need the length of, since we are replacing each character with an underscore. The returned string is substituted for that matching the pattern.