python regex: how to remove hex dec characters from string [duplicate]

2024/11/15 19:41:54
text="\xe2\x80\x94"
print re.sub(r'(\\(?<=\\)x[a-z0-9]{2})+',"replacement_text",text)

output is

how can I handle the hex decimal characters in this situation?

Answer

Your input doesn't have backslashes. It has 3 bytes, the UTF-8 encoding for the U+2014 EM DASH character:

>>> text = "\xe2\x80\x94"
>>> len(text)
3
>>> text[0]
'\xe2'
>>> text.decode('utf8')
u'\u2014'
>>> print text.decode('utf8')
—

You either need to match those UTF-8 bytes directly, or decode from UTF-8 to unicode and match the codepoint. The latter is preferable; always try to deal with text as Unicode to simplify how many characters you have to transform at a time.

Also note that Python's repr() output (which is used impliciltly when echoing in the interactive interpreter or when printing lists, dicts or other containers) uses \xhh escape sequences to represent any non-printable character. For UTF-8 strings, that includes anything outside the ASCII range. You could just replace anything outside that range with:

re.sub(r'[\x80-\xff]+', "replacement_text", text)

Take into account that this'll match multiple UTF-8-encoded characters in a row, and replace these together as a group!

https://en.xdnf.cn/q/119620.html

Related Q&A

Iterating through list and getting even and odd numbers

yet one more exercise that I seem to have a problem with. Id say Ive got it right, but Python knows better. The body of the task is:Write a function that takes a list or tuple of numbers. Return a two-…

Cannot import tensorflow-gpu

I have tried to import tensorflow-gpu and Im getting the same error with different versions of CUDA and cuDNN. My GPU is compatible with CUDA and I have no problems installing but when I try to import …

comparing two Dataframe columns to check if they have same value in python

I have two dataframes,new1.Name city0 sri won chn1 pechi won pune2 Ram won mum0 pec won keralanew3req 0 pec 1 mutI tried, mask=new1.Name.str.contains("|".join(…

Input gravity forms entries in a database locally stores (created with python)

I hope you are all doing alright. Is it possible to connect a gform entry to a database created with Python and stored in my PC with a little variation of the following code? add_action("gform_af…

Cant get javascript generated html using python

Im trying to create a python script that automatically gets the content of a table on a webpage. I manage to have it to work on pure html page, but there is one website that gives me headache... The ht…

Python: Extract text from Word files in a url

Given the url containing a certain file, in this case a word document, read the contents of the document. I have seen several examples of how to extract text from local documents but not from a url. Wo…

Python3:Plot f(x,y), preferably using matplotlib

Is there a way, preferably using matplotlib, to plot a 2-variable function f(x,y) in python; Thank you, in advance.

Why does my cronjob not send the email from my script? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

How to delete unsaved tkinker label?

I made this program where I am putting labels on a grid without saving them in a variable. I do this because then I can for loop through a list of classes and get the data from each class in and add th…

Adjust every other row of a data frame

I would like to change every second row of my data frame.I have a df like this:Node | Feature | Indicator | Value | Class | Direction -------------------------------------------------------- 1 | …