decoding shift-jis: illegal multibyte sequence

2024/10/8 8:32:31

I'm trying to decode a shift-jis encoded string, like this:

string.decode('shift-jis').encode('utf-8')

to be able to view it in my program.

When I come across 2 shift-jis characters, in hex "0x87 0x54" and "0x87 0x55", I get this error:

UnicodeDecodeError: 'shift_jis' codec can't decode bytes in position 12-13: illegal multibyte sequence

But I'm sure they are valid shift-jis characters: http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml

I've also noticed that those characters appear as black boxes in my shift-jis text editor, which means they are not recognized. So there's something special about these two chars that made my editor and Python decoder fail. Help?

(sorry, I couldn't post an example string because when those characters are present, it doesn't get added to the clipboard from there onward and also gets converted to unicode automatically. I posted the hex values for them though.)

Answer

Multiple versions of Shift JIS exist. The shift_jis codec is JIS X 0208, whereas that table is JIS X 0213, corresponding to the shift_jisx0213 codec.

>>> u'⑲⑳Ⅰ'.encode('shift_jisx0213')
'\x87R\x87S\x87T'
https://en.xdnf.cn/q/70140.html

Related Q&A

Add columns in pandas dataframe dynamically

I have following code to load dataframe import pandas as pdufo = pd.read_csv(csv_path) print(ufo.loc[[0,1,2] , :])which gives following output, see the structure of the csvCity Colors Reported Shape Re…

How do you add input from user into list in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 9 years ago.Improve…

How to suppress matplotlib inline for a single cell in Jupyter Notebooks/Lab?

I was looking at matplotlib python inline on/off and this kind of solves the problem but when I do plt.ion() all of the Figures pop up (100s of figures). I want to keep them suppressed in a single cel…

Using django-filer, can I chose the folder that images go into, from Unsorted Uploads

Im using django-filer for the first time, and it looks great, and work pretty well.But all my images are being uploaded to the Unsorted Uploads folder, and I cant figure out a way to put them in a spec…

how to use distutils to create executable .zip file?

Python 2.6 and beyond has the ability to directly execute a .zip file if the zip file contains a __main__.py file at the top of the zip archive. Im wanting to leverage this feature to provide preview r…

Pass nested dictionary location as parameter in Python

If I have a nested dictionary I can get a key by indexing like so: >>> d = {a:{b:c}} >>> d[a][b] cAm I able to pass that indexing as a function parameter? def get_nested_value(d, pat…

Correct way to deprecate parameter alias in click

I want to deprecate a parameter alias in click (say, switch from underscores to dashes). For a while, I want both formulations to be valid, but throw a FutureWarning when the parameter is invoked with …

How to rename a file with non-ASCII character encoding to ASCII

I have the file name, "abc枚.xlsx", containing some kind of non-ASCII character encoding and Id like to remove all non-ASCII characters to rename it to "abc.xlsx".Here is what Ive t…

draw random element in numpy

I have an array of element probabilities, lets say [0.1, 0.2, 0.5, 0.2]. The array sums up to 1.0.Using plain Python or numpy, I want to draw elements proportional to their probability: the first eleme…

Python Gevent Pywsgi server with ssl

Im trying to use gevent.pywsgi.WSGIServer to wrap a Flask app. Everything works fine, however, when I try to add a key and a certificate for ssl, its not even able to accept any clients anymore.This is…