What does sys.maxunicode mean?

2024/10/4 15:36:57

CPython stores unicode strings as either utf-16 or utf-32 internally depending on compile options. In utf-16 builds of Python string slicing, iteration, and len seem to work on code units, not code points, so that multibyte characters behave strangely.

E.g., on CPython 2.6 with sys.maxunicode = 65535:

>>> char = u'\U0001D49E'
>>> len(char)
2
>>> char[0:1]
u'\uu835'
>>> char[1:2]
u'\udc9e'

According to the Python documentation, sys.maxunicode is "An integer giving the largest supported code point for a Unicode character."

Does this mean that unicode operations aren't guranteed to work on code points beyond sys.maxunicode? If I want to work with characters outside the BMP I either have to use a utf-32 build or write my own portable unicode operations?

I came across this problem in How to iterate over Unicode characters in Python 3?

Answer

Characters beyond sys.maxunicode=65535 are stored internally using UTF-16 surrogates. Yes you have to deal with this yourself or use a wide build. Even with a wide build you also may have to deal with single characters represented by a combination of code points. For example:

>>> print('a\u0301')
á
>>> print('\xe1')
á

The first uses a combining accent character and the second doesn't. Both print the same. You can use unicodedata.normalize to convert the forms.

https://en.xdnf.cn/q/70596.html

Related Q&A

How to detect dialogs close event?

Hi everyone.I am making a GUI application using python3.4, PyQt5 in windows 7. Application is very sample. User clicks a main windows button, information dialog pops up. And when a user clicks informat…

How to Make a Portable Jupyter Slideshow

How do I make a Jupyter slide show portable? I can serve the slideshow locally, but I cant send that to anyone and have it work with all the images, slide animation functionality, etc. I am using jupy…

How to animate a bar char being updated in Python

I want to create an animated, stacked bar chart.There is a great tutorial, which shows how to animate line graphs.However, for animating bar charts, the BarContainer object, does not have any attribute…

Add text to end of line without loading file

I need to store information into a very big file, in form of many dictionaries. Thats not so important, is just to say that I tried to first get all the data into these dictionaries and I run out of me…

How does one use `dis.dis` to analyze performance?

Im trying to use pythons dis library to experiment with & understand performance. Below is an experiment i tried, with the results.import disdef myfunc1(dictionary):t = tuple(dictionary.items())ret…

How do I require HTTPS for this Django view?

(r^login/?$,django.contrib.auth.views.login,{template_name:login.html, authentication_form:CustomAuthenticationForm}),How do I add HTTPS required to this? I usually have a decorator for it..But in th…

How many times a number appears in a numpy array

I need to find a way to count how many times each number from 0 to 9 appears in a random matrix created using np.random.randint()import numpy as np p = int(input("Length of matrix: ")) m = np…

python: How to remove values from 2 lists based on whats in 1 list

I have 2 lists of numbers, one called xVar and the other called yVar. I will use these 2 elements to plot X and Y values on a graph. They both have the same number of elements. Normally, I would jus…

merge two dataframe columns into 1 in pandas

I have 2 columns in my data frame and I need to merge it into 1 single columnIndex A Index B 0 A 0 NAN 1 NAN 1 D 2 B 2 …

Upsample and Interpolate a NumPy Array

I have an array, something like:array = np.arange(0,4,1).reshape(2,2)> [[0 12 3]]I want to both upsample this array as well as interpolate the resulting values. I know that a good way to upsample an…