Convert UTF-8 to string literals in Python

2024/10/14 23:10:54

I have a string in UTF-8 format but not so sure how to convert this string to it's corresponding character literal. For example I have the string:

My string is: 'Entre\xc3\xa9'

Example one:

This code:

u'Entre\xc3\xa9'.encode('latin-1').decode('utf-8')

returns the result: u'Entre\xe9'

If I then continue by printing this:

print u'Entre\xe9'

I get the result: Entreé

This is great and close to what I need. The problem is, I can't make 'Entre\xc3\xa9' a variable and pass it through the steps as this now breaks. Any tips for getting this working?

Example:

a = 'Entre\xc3\xa9'
b = 'u'+ a.encode('latin-1').decode('utf-8')
c= 'u'+ b

I would like result of "c" to be:

Entreé
Answer

The u'' syntax only works for string literals, e.g. defining values in source code. Using the syntax results in a unicode object being created, but that's not the only way to create such an object.

You cannot make a unicode value from a byte string by adding u in front of it. But if you called str.decode() with the right encoding, you get a unicode value. Vice-versa, you can encode unicode objects to byte strings with unicode.encode().

Note that when displaying a unicode object, Python represents it by using the Unicode string literal syntax again (so u'...'), to ease debugging. You can paste the representation back in to a Python interpreter and get an object with the same value.

Your a value is defined using a byte string literal, so you only need to decode:

a = 'Entre\xc3\xa9'
b = a.decode('utf8')

Your first example created a Mojibake, a Unicode string containing Latin-1 codepoints that actually represent UTF-8 bytes. This is why you had to encode to Latin-1 first (to undo the Mojibake), then decode from UTF-8.

You may want to read up on Python and Unicode in the Unicode HOWTO. Other articles of interest are:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

  • Pragmatic Unicode by Ned Batchelder

https://en.xdnf.cn/q/69358.html

Related Q&A

Memory usage not getting lowered even after job is completed successfully

I have a job added in apscheduler which loads some data in memory and I am deleting all the objects after the job is complete. Now if I run this job with python it works successfully and memory drop af…

How to output sklearn standardscaler

I have standardized my data in sklearn using preprocessing.standardscaler. Question is how could I save this in my local for latter use?Thanks

How to use Jobqueue in Python-telegram-bot

I have able to make a bot very easily by reading the docs but Jobqueue is not working as per it is written. The run_daily method uses a datetime.time object to send the message at a particular time but…

Is there a way to override default assert in pytest (python)?

Id like to a log some information to a file/database every time assert is invoked. Is there a way to override assert or register some sort of callback function to do this, every time assert is invoked?…

How to install pycairo on osx?

I am trying to install the pycairo (Python bindings for the cairo graphics library) under OSX.I started witheasy_install pycairoand got: Requested cairo >= 1.8.8 but version of cairo is 1.0.4error: …

How do I change a value in a .npz file?

I want to change one value in an npz file.The npz file contains several npys, I want all but one ( run_param ) to remain unchanged and I want to save over the original file.This is my working code:DATA…

How to load *.hdr files using python

I would like to read an environment map in *.hdr file format. It seems that very popular libraries doesnt support .hdr file reading, for example, OpenCV, PIL etc.. So how to read a .hdr file into a num…

What is the preferred way to compose a set from multiple lists in Python

I have a few different lists that I want to turn into a set and use to find the difference from another set. Lets call them A, B, and C. Is the more optimal way to do this set(A + B + C) or set(A).unio…

matplotlib plot small image without resampling

Im trying to plot a small image in python using matplotlib and would like the displayed axes to have the same shape as the numpy array it was generated from, i.e. the data should not be resampled. In o…

Nullable ForeignKeys and deleting a referenced model instance

I have a ForeignKey which can be null in my model to model a loose coupling between the models. It looks somewhat like that:class Message(models.Model):sender = models.ForeignKey(User, null=True, blank…