Question 1

I am trying to do the following with a regular expression:

import re
x = re.compile('[^(going)|^(you)]')    # words to replace
s = 'I am going home now, thank you.' # string to modify
print re.sub(x, '_', s)

The result I get is:

'_____going__o___no______n__you_'

The result I want is:

'_____going_________________you_'

Since the ^ can only be used inside brackets [], this result makes sense, but I'm not sure how else to go about it.

I even tried '([^g][^o][^i][^n][^g])|([^y][^o][^u])' but it yields '_g_h___y_'.

Question 2

Not quite as easy as it first appears, since there is no "not" in REs except ^ inside [ ] which only matches one character (as you found). Here is my solution:

import redef subit(m):stuff, word = m.groups()return ("_" * len(stuff)) + words = 'I am going home now, thank you.' # string to modifyprint re.sub(r'(.+?)(going|you|$)', subit, s)

Gives:

_____going_________________you_

To explain. The RE itself (I always use raw strings) matches one or more of any character (.+) but is non-greedy (?). This is captured in the first parentheses group (the brackets). That is followed by either "going" or "you" or the end-of-line ($).

subit is a function (you can call it anything within reason) which is called for each substitution. A match object is passed, from which we can retrieve the captured groups. The first group we just need the length of, since we are replacing each character with an underscore. The returned string is substituted for that matching the pattern.

Python regular expression to replace everything but specific words

Related Q&A

How do I raise a window that is minimized or covered with PyGObject?

How to bind multiple widgets with one bind in Tkinter?

Iterate a large .xz file line by line in python

Detect multiple circles in an image

Need guidance with FilteredSelectMultiple widget

Django: determine which user is deleting when using post_delete signal

Double inheritance causes metaclass conflict

Mask area outside of imported shapefile (basemap/matplotlib)

Python Glob.glob: a wildcard for the number of directories between the root and the destination

Get datetime format from string python