How to convert \xXY encoded characters to UTF-8 in Python?

2024/9/8 10:51:32

I have a text which contains characters such as "\xaf", "\xbe", which, as I understand it from this question, are ASCII encoded characters.

I want to convert them in Python to their UTF-8 equivalents. The usual string.encode("utf-8") throws UnicodeDecodeError. Is there some better way, e.g., with the codecs standard library?

Sample 200 characters here.

Answer

Your file is already a UTF-8 encoded file.

# saved encoding-sample to /tmp/encoding-sample
import codecs
fp= codecs.open("/tmp/encoding-sample", "r", "utf8")
data= fp.read()import unicodedata as udchars= sorted(set(data))
for char in chars:try:charname= ud.name(char)except ValueError:charname= "<unknown>"sys.stdout.write("char U%04x %s\n" % (ord(char), charname))

And manually filling in the unknown names:
char U000a LINE FEED
char U001e INFORMATION SEPARATOR TWO
char U001f INFORMATION SEPARATOR ONE

https://en.xdnf.cn/q/72828.html

Related Q&A

Pandas One hot encoding: Bundling together less frequent categories

Im doing one hot encoding over a categorical column which has some 18 different kind of values. I want to create new columns for only those values, which appear more than some threshold (lets say 1%), …

How to pass classs self through a flask.Blueprint.route decorator?

I am writing my websites backend using Flask and Python 2.7, and have run into a bit of a problem. I like to use classes to enclose my functions, it makes things neat for me and helps me keep everythin…

why cannot I use sp.signal by import scipy as sp? [duplicate]

This question already has an answer here:scipy.special import issue(1 answer)Closed 8 years ago.I would like to use scipy.signal.lti and scipy.signal.impulse function to calculate the transfer function…

How to speed up nested cross validation in python?

From what Ive found there is 1 other question like this (Speed-up nested cross-validation) however installing MPI does not work for me after trying several fixes also suggested on this site and microso…

Streaming video from camera in FastAPI results in frozen image after first frame

I am trying to stream video from a camera using FastAPI, similar to an example I found for Flask. In Flask, the example works correctly, and the video is streamed without any issues. However, when I tr…

Fastest way to concatenate multiple files column wise - Python

What is the fastest method to concatenate multiple files column wise (within Python)?Assume that I have two files with 1,000,000,000 lines and ~200 UTF8 characters per line.Method 1: Cheating with pas…

Can autograd in pytorch handle a repeated use of a layer within the same module?

I have a layer layer in an nn.Module and use it two or more times during a single forward step. The output of this layer is later inputted to the same layer. Can pytorchs autograd compute the grad of t…

Altering numpy function output array in place

Im trying to write a function that performs a mathematical operation on an array and returns the result. A simplified example could be:def original_func(A):return A[1:] + A[:-1]For speed-up and to avoi…

Does the E-factory of lxml support dynamically generated data?

Is there a way of creating the tags dynamically with the E-factory of lxml? For instance I get a syntax error for the following code:E.BODY(E.TABLE(for row_num in range(len(ws.rows)):row = ws.rows[row…

Check if datetime object in pandas has a timezone?

Im importing data into pandas and want to remove any timezones – if theyre present in the data. If the data has a time zone, the following code works successfully: col = "my_date_column" df[…