Can I turn off implicit Python unicode conversions to find my mixed-strings bugs?

2024/9/20 15:22:46

When profiling our code I was surprised to find millions of calls to
C:\Python26\lib\encodings\utf_8.py:15(decode)

I started debugging and found that across our code base there are many small bugs, usually comparing a string to a unicode or adding a sting and a unicode. Python graciously decodes the strings and performs the following operations in unicode.

How kind. But expensive!

I am fluent in unicode, having read Joel Spolsky and Dive Into Python...

I try to keep our code internals in unicode only.

My question - can I turn off this pythonic nice-guy behavior? At least until I find all these bugs and fix them (usually by adding a u'u')?

Some of them are extremely hard to find (a variable that is sometimes a string...).

Python 2.6.5 (and I can't switch to 3.x).

Answer

The following should work:

>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('undefined')
>>> u"abc" + u"xyz"
u'abcxyz'
>>> u"abc" + "xyz"
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/encodings/undefined.py", line 22, in decoderaise UnicodeError("undefined encoding")
UnicodeError: undefined encoding

reload(sys) in the snippet above is only necessary here since normally sys.setdefaultencoding is supposed to go in a sitecustomize.py file in your Python site-packages directory (it's advisable to do that).

https://en.xdnf.cn/q/72153.html

Related Q&A

jupyter: how to stop execution on errors?

The common way to defensively abort execution in python is to simply do something like: if something_went_wrong:print("Error message: goodbye cruel world")exit(1)However, this is not good pra…

Python 2.7 on Google App Engine, cannot use lxml.etree

Ive been trying to use html5lib with lxml on python 2.7 in google app engine. But when I run the following code, it gives me an error saying "NameError: global name etree is not defined". Is …

Pandas split name column into first and last name if contains one space

Lets say I have a pandas DataFrame containing names like so:name_df = pd.DataFrame({name:[Jack Fine,Kim Q. Danger,Jane Smith, Juan de la Cruz]})name 0 Jack Fine 1 Kim Q. Danger 2 Jane Smith 3 J…

Docker. No such file or directory

I have some files which I want to move them to a docker container. But at the end docker cant find a file..The folder with the files on local machine are at /home/katalonne/flask4File Structure if it m…

How to recover original values after a model predict in keras?

This is a more conceptual question, but I have to confess I have been dealing with it for a while. Suppose you want to train a neural network (NN), using for instance keras. As it is recommended you pe…

Find closest line to each point on big dataset, possibly using shapely and rtree

I have a simplified map of a city that has streets in it as linestrings and addresses as points. I need to find closest path from each point to any street line. I have a working script that does this, …

Reading pretty print json files in Apache Spark

I have a lot of json files in my S3 bucket and I want to be able to read them and query those files. The problem is they are pretty printed. One json file has just one massive dictionary but its not in…

Visualize TFLite graph and get intermediate values of a particular node?

I was wondering if there is a way to know the list of inputs and outputs for a particular node in tflite? I know that I can get input/outputs details, but this does not allow me to reconstruct the com…

Why do I get a pymongo.cursor.Cursor when trying to query my mongodb db via pymongo?

I have consumed a bunch of tweets in a mongodb database. I would like to query these tweets using pymongo. For example, I would like to query for screen_name. However, when I try to do this, python doe…

using dropbox as a server for my django app

I dont know if at all i make any sense, but this popped up in my mind. Can we use the 2gb free hosting of dropbox to put our django app over there and do some hacks to run our app?