How well does your language support unicode in practice?

2024/11/16 15:45:47

I'm looking into new languages, kind of craving for one where I no longer need to worry about charset problems amongst inordinate amounts of other niggles I have with PHP for a new project.

I tend to find Java too verbose and messy, and my not wanting to touch Windows with a 6-foot pole tends to rule out .Net. That leaves essentially everything else -- except PHP, C and C++ (the latter two of which I know get messy with unicode stuff irrespective of the ICU library).

I've short listed a few languages to date, namely Ruby (loved the mixins), Python, Lisp and Javascript (node.js). However, I'm coming with highly inconsistent information on unicode support and I'm dreading (lack of time...) to learn each and every one of them to the point where I can safely break it to rule it out.

In so far as I understood, Python 3 seems to have it. As does Ruby 1.9. Lisp not necessarily. Javascript presumably.

There's arguably more than unicode support to a language, but in my experience it tends to become a major drawback when dealing with locale.

I also realize the question is somewhat subjective. (Please don't close it on that grounds: I'm actually linking to several SO threads which I found unsatisfying.) But... as a user of any of these languages, how well do they support unicode in practice?

Answer

Python's unicode support did not really change in 3.x. The unicode support in Python has been pretty much the same since Python 2.x, which introduced the separate unicode type and the encoding handling. What Python 3.x changes is that unicode becomes the only string type (and is renamed to str), whereas 2.x has bytestrings (str, "...") and unicode strings (unicode, u"...") that often but not always don't quite mix. (Allowing them to mix was an attempt to make transitioning from bytestrings to unicode easier, but it turned out a mistake.) All in all, Python's unicode support is quite good, mistakes in Python 2.x notwithstanding. There's unicode literals with numeric and named escapes, source-encoding declarations for non-ASCII characters in unicode literals, automatic encoding/decoding through the codecs module, unicode support in many libraries (like the regular expression and DB-API modules) and a builtin unicode database.

That said, you still need to know about encodings in order to handle text correctly. Your program will receive bytes in some encoding (be it from files, from environment variables or through other input) and they will need to be interpreted in that encoding. If you don't know the encoding (and can't determine it from the data, like in HTML or XML) you can really only process the data as bytes. If you do know the encoding, Python does allow you to deal with it mostly transparently.

https://en.xdnf.cn/q/71326.html

Related Q&A

analogy to scipy.interpolate.griddata?

I want to interpolate a given 3D point cloud:I had a look at scipy.interpolate.griddata and the result is exactly what I need, but as I understand, I need to input "griddata" which means some…

Mongodb TTL expires documents early

I am trying insert a document into a Mongo database and have it automatically expire itself after a predetermine time. So far, my document get inserted but always get deleted from the database from 0 -…

sqlalchemy, hybrid property case statement

This is the query Im trying to produce through sqlalchemySELECT "order".id AS "id", "order".created_at AS "created_at", "order".updated_at AS "u…

Whats the newest way to develop gnome panel applets (using python)

Today Ive switched to GNOME (from XFCE) and found some of the cool stuff missing and I would like to (try to) do them on my own. I tried to find information on how to develop Gnome applets (items you p…

How to install opencv-python in python 3.8

Im having problem during the installation of opencv-python in pycharm. After opening pycharm I click on settings and then project interpreter, I click on + and search for the correct module, I started …

python augmented assignment for boolean operators

Does Python have augmented assignment statements corresponding to its boolean operators?For example I can write this:x = x + 1or this:x += 1Is there something I can write in place of this:x = x and yT…

Change a pandas DataFrame column value based on another column value

I have a dataframe with two columns each of which represents an organism. They are called ORG1 and ORG2 I want to move the values of ORG2 into ORG1 for the corresponding index value.So, if ORG1 is A a…

How to lock a sqlite3 database in Python?

Is there a way to explicitly acquire a lock on a sqlite3 database in Python?

How to install pyzmq on an Alpine Linux container?

I have a container with the python:3.6-alpine kernel. I have a problem installing the pyzmq via pip on this: Dockerfile: FROM python:3.6-alpineRUN mkdir /code RUN apk add vim WORKDIR / ADD . /codedocke…

How to Get Value Out from the Tkinter Slider (Scale)?

So, here is the code I have, and as I run it, the value of the slider bar appears above the slider, I wonder is there a way to get that value out? Maybe let a=that value. ;)from Tkinter import *contr…