decoding shift-jis: illegal multibyte sequence

2024/10/8 8:32:31

I'm trying to decode a shift-jis encoded string, like this:


to be able to view it in my program.

When I come across 2 shift-jis characters, in hex "0x87 0x54" and "0x87 0x55", I get this error:

UnicodeDecodeError: 'shift_jis' codec can't decode bytes in position 12-13: illegal multibyte sequence

But I'm sure they are valid shift-jis characters:

I've also noticed that those characters appear as black boxes in my shift-jis text editor, which means they are not recognized. So there's something special about these two chars that made my editor and Python decoder fail. Help?

(sorry, I couldn't post an example string because when those characters are present, it doesn't get added to the clipboard from there onward and also gets converted to unicode automatically. I posted the hex values for them though.)


Multiple versions of Shift JIS exist. The shift_jis codec is JIS X 0208, whereas that table is JIS X 0213, corresponding to the shift_jisx0213 codec.

>>> u'⑲⑳Ⅰ'.encode('shift_jisx0213')

