How to get email.Header.decode_header to work with non-ASCII characters?

2024/10/8 6:30:01

I'm borrowing the following code to parse email headers, and additionally to add a header further down the line. Admittedly, I don't fully understand the reason for all the scaffolding around what should be straightforward usage of the email.Headers module.

Noteworthy is that Headers is not instantiated; rather its decode_header function is called:

class DecodedHeader(object):def __init__(self, s, folder):self.msg=email.message_from_string(s[1])self.info=parseList(s[0])self.folder=folderdef __getitem__(self,name):if name.lower()=='folder': return self.folderelif name.lower()=='uid': return self.info[1][3]elif name.lower()=='flags': return ','.join(self.info[1][1])elif name.lower()=='internal-date':ds= self.info[1][5]if Options.dateFormat:ds= time.strftime(Options.dateFormat,imaplib.Internaldate2tuple('INTERNALDATE "'+ds+'"'))return dselif name.lower()=='size': return self.info[1][7]val= self.msg.__getitem__(name)if val==None: return Nonereturn self._convert(email.Header.decode_header(val),name)def get(self,key,default=None):return self.__getitem__(key)def _convert(self, list, name):l=[]for s, encoding in list:try:    if (encoding!=None):s=unicode(s,encoding, 'replace').encode(Options.encoding,'replace')except Exception, e:print >>sys.stderr, "Encoding error", el.append(s)res= "".join(l)if Options.addr and name.lower() in ('from','to', 'cc', 'return-path','reply-to' ): res=self._modifyAddr(res)if Options.dateFormat and name.lower() in ('date'): res = self._formatDate(res)return res  

Here's the problem: When the header (val) contains non-ASCII characters such as Ä and ä, I get:

Traceback (most recent call last):File "v12.py", line 434, in <module>main()File "v12.py", line 396, in mainwriter.writerow(msg)File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 152, in writerowreturn self.writer.writerow(self._dict_to_list(rowdict))File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 149, in _dict_to_listreturn [rowdict.get(key, self.restval) for key in self.fieldnames]File "v12.py", line 198, in getreturn self.__getitem__(key)File "v12.py", line 196, in __getitem__return self._convert(email.Header.decode_header(val),name)File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/email/header.py", line 76, in decode_headerheader = str(header)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)

where u'\xe4' is ä.

I've tried a few things:

  • Adding # -- coding: utf-8 -- to the top of header.py
  • Calling unicode() on val before passing it to decode_header()
  • Calling .encode('utf-8') on val before passing it to decode_header()
  • Calling .encode('ISO-8859-1') on val before passing it to decode_header()

No joy with any of the above. What is at cause here? Given that I'm looking to maintain the usage of email.Header as above (with Header not instantiated directly), how do we ensure that non-ASCII characters get successfully decoded by decode_header?

Answer

The header has to be encoded correctly in order to be decoded. It looks like val comes from an already existing message, so maybe that message is bad. The error indicates it is a Unicode string, but it should be a byte string at that point. The examples on in the Python help for email.header are straightforward.

Below encodes two headers that don't even use the same encoding:

>>> import email.header
>>> h = email.header.Header(u'To: Märk'.encode('iso-8859-1'),'iso-8859-1')
>>> h.append(u'From: Jòhñ'.encode('utf8'),'utf8')
>>> h
<email.header.Header instance at 0x00559F58>
>>> s = h.encode()
>>> s
'=?iso-8859-1?q?To=3A_M=E4rk?= =?utf-8?b?RnJvbTogSsOyaMOx?='

Note that the correctly encoded header is a byte string with the encoding names embedded, and it uses no non-ASCII characters.

This decodes them:

>>> email.header.decode_header(s)
[('To: M\xe4rk', 'iso-8859-1'), ('From: J\xc3\xb2h\xc3\xb1', 'utf-8')]
>>> d = email.header.decode_header(s)
>>> for s,e in d:
...  print s.decode(e)
...
To: Märk
From: Jòhñ
https://en.xdnf.cn/q/118727.html

Related Q&A

Elif syntax error in Python

This is my code for a if/elif/else conditional for a text-based adventure game Im working on in Python. The goal of this section is to give the player options on what to do, but it says there is someth…

Convert date from dd-mm-yy to dd-mm-yyyy using python [duplicate]

This question already has answers here:How to parse string dates with 2-digit year?(6 answers)Closed 7 years ago.I have a date input date_dob which is 20-Apr-53 I tried converting this to format yyyy…

sklearn pipeline transform ValueError that Expected Value is not equal to Trained Value

Can you please help me to with the following function where I got the error of ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword(The function is calle…

How to show Chinese characters in Matplotlib graphs?

I want to make a graph based on a data frame that has a column with Chinese characters. But the characters wont show on the graph, and I received this error. C:\Users\march\anaconda3\lib\site-packages\…

nginx flask aws 502 Bad Gateway

My server is running great yesterday but now it returned a 502 error, how could this happen?In my access.log shows:[24/Aug/2016:07:40:29 +0000] "GET /ad/image/414 HTTP/1.1" 502 583 "-&q…

Let discord bot interact with other bots

I have a Python script for a Discord bot and I want it to send a message to another Bot and select the prompt option and then type in a message but I cant get the interaction done. It just sends the me…

Image has 3 channels but its in a grayscale color. If I change it to 1 channel, it goes into RGB

I started doing some image-processing in python and Ive stumbled upon an issue which is kind of confusing from a beginners perspective. I have a dataset of 1131 np arrays (images) of MRI on knee. The s…

Creating a Barplot using pyqt

I need plotting an animated bar chart with pyqtgraph. With animate i mean a chart, which updates his values given by a serial port. For now, a not-animated plot will be enough. I would like to implemen…

Stop Button in Tkinter

Im trying to have a turtle animation start with a button and stop with a button. Its very easy to start with a button but I cant seem to be able to figure out a stop button? Heres my code so far: imp…

binascii.Error: Incorrect padding How to decode the end with /

I received a string encoded with base64, I am using python to decode it, but decoding failed, I found that the string is followed by / ends, I dont know how to decode it, I havent found the answer, who…