Question 1

text="\xe2\x80\x94"
print re.sub(r'(\\(?<=\\)x[a-z0-9]{2})+',"replacement_text",text)

output is —

how can I handle the hex decimal characters in this situation?

Question 2

Your input doesn't have backslashes. It has 3 bytes, the UTF-8 encoding for the U+2014 EM DASH character:

>>> text = "\xe2\x80\x94"
>>> len(text)
3
>>> text[0]
'\xe2'
>>> text.decode('utf8')
u'\u2014'
>>> print text.decode('utf8')
—

You either need to match those UTF-8 bytes directly, or decode from UTF-8 to unicode and match the codepoint. The latter is preferable; always try to deal with text as Unicode to simplify how many characters you have to transform at a time.

Also note that Python's repr() output (which is used impliciltly when echoing in the interactive interpreter or when printing lists, dicts or other containers) uses \xhh escape sequences to represent any non-printable character. For UTF-8 strings, that includes anything outside the ASCII range. You could just replace anything outside that range with:

re.sub(r'[\x80-\xff]+', "replacement_text", text)

Take into account that this'll match multiple UTF-8-encoded characters in a row, and replace these together as a group!

python regex: how to remove hex dec characters from string [duplicate]

Related Q&A

Iterating through list and getting even and odd numbers

Cannot import tensorflow-gpu

comparing two Dataframe columns to check if they have same value in python

Input gravity forms entries in a database locally stores (created with python)

Cant get javascript generated html using python

Python: Extract text from Word files in a url

Python3:Plot f(x,y), preferably using matplotlib

Why does my cronjob not send the email from my script? [closed]

How to delete unsaved tkinker label?

Adjust every other row of a data frame