what on earth the unicode number is?

2024/10/13 19:23:17

in python:

>>> "\xc4\xe3".decode("gbk").encode("utf-8")
'\xe4\xbd\xa0'
>>> "\xc4\xe3".decode("gbk")
u'\u4f60'

we can get two conclusions:

1.\xc4\xe3 in gbk encode = \xe4\xbd\xa0 in utf-8
2.\xc4\xe3 in gbk encode = \x4f\x60 in unicode(or say in ucs-2)

in R:

> iconv("\xc4\xe3",from="gbk",to="utf-8",toRaw=TRUE)
[[1]]
[1] e4 bd a0
> iconv("\xc4\xe3",from="gbk",to="unicode",toRaw=TRUE)
[[1]]
[1] ff fe 60 4f

now ,the conclusion1 is correct ,it is as same in python as in R
conclusion2 is a puzzle,
what on earth is the \xc4\xe3 in gbk encode = ?? in unicode.
in python it is u'\u4f60',in R it is ff fe 60 4f
are the equal? which one is correct?are they all correct?

Answer

In python, the \uxxxx notation refers to Unicode codepoints, not to any encoding of those codepoints.

UCS-2, UTF-16, UTF-8 are all encodings capable of capturing those codepoints in bytes suitable for storage in files, for transferring across a network, etc.

The R representation of the \u4f60 codepoint includes the UTF-16 Byte Order Mark, or BOM. It indicates what byte order is chosen, where 0xFFFE means little endian. Python includes it too, when you encode to UTF-16:

>>> u'\uf460'.encode('utf16')
'\xff\xfe`\xf4'

The big-endian equivalent is 0xFEFF. You can explicitly encode to utf-16be or utf-16le in python to avoid the BOM being included, because you've made an explicit choice:

>>> u'\uf460'.encode('utf-16be')
'\xf4`'
>>> u'\uf460'.encode('utf-16le')
'`\xf4'

You really should read the Joel Spolsky Unicode article, as well as the Python Unicode HOWTO to more fully appreciate the difference between Unicode and encodings.

https://en.xdnf.cn/q/118045.html

Related Q&A

explicitly setting style sheet in python pyqt4?

In pyqt standard way for setting style sheet is like this:MainWindow.setStyleSheet(_fromUtf8("/*\n" "gridline-color: rgb(85, 170, 255);\n" "QToolTip\n" "{\n" &qu…

missing required Charfield in django is saved as empty string and do not raise an error

If I try to save incomplete model instance in Django 1.10, I would expect Django to raise an error. It does not seem to be the case.models.py:from django.db import modelsclass Essai(models.Model):ch1 =…

Beautiful soup missing some html table tags

Im trying to extract data from a website using beautiful soup to parse the html. Im currently trying to get the table data from the following webpage :link to webpageI want to get the data from the tab…

403 error Not Authorized to access this resource/api Google Admin SDK in web app even being admin

Im struggling to find the problem since two days without any idea why I get this error now even though the app was fully functional one month before.Among the tasks done by the web app, it makes an Adm…

Kivy - My ScrollView doesnt scroll

Im having problems in my Python application with Kivy library. In particular Im trying to create a scrollable list of elements in a TabbedPanelItem, but I dont know why my list doesnt scroll.Here is my…

How to get an associated model via a custom admin action in Django?

Part 2 of this question asked and answered separately.I have a Report and a ReportTemplate. +----+----------+---------------+-------------+ | id | title | data | template_id | +----+-------…

How can I use descriptors for non-static methods?

I am aware that I can use descriptors to change static property as if it were a normal property. However, when I try using descriptors for a normal class property, I end up changing the object it refer…

psycopg2 not all arguments converted during string formatting

I am trying to use psycopg2 to insert a row into a table from a python list, but having trouble with the string formatting.The table has 4 columns of types (1043-varchar, 1114-timestamp, 1043-varchar, …

inherited function odoo python

i want to inherit function in module hr_holidays that calculate remaining leaves the function is :hr_holiday.py:def _get_remaining_days(self, cr, uid, ids, name, args, context=None):cr.execute("&…

ValueError in pipeline - featureHasher not working?

I think Im having issues getting my vectorizer working within a gridsearch pipeline:data as panda df x_train:bathrooms bedrooms price building_id manager_id 10 1.5 3 3000 53a5b119b…