Wrong encoding of email attachment

2024/10/8 10:53:19

I have a python 2.7 script running on windows. It logs in gmail, checks for new e-mails and attachments:

#!/usr/bin/env python
# -*- coding: utf-8 -*-file_types = ["pdf", "doc", "docx"] # download attachments with these extentionslogin = "login"
passw = "password"imap_server = "imap.gmail.com"
smtp_server = "smtp.gmail.com"
smtp_port = 587from smtplib import SMTP
from email.parser import HeaderParser
from email.MIMEText import MIMEText
import sys
import imaplib
import getpass
import email
import datetime
import os
import timeif __name__ == "__main__":try:while True:session = imaplib.IMAP4_SSL(imap_server)try:rv, data = session.login(login, passw)print "Logged in: ", rvexcept imaplib.IMAP4.error:print "Login failed!"sys.exit(1)rv, mailboxes = session.list()rv, data = session.select(foldr)rv, data = session.search(None, "(UNSEEN)")for num in data[ 0 ].split():rv, data = session.fetch(num, "(RFC822)")for rpart in data:if isinstance(rpart, tuple):msg = email.message_from_string(rpart[ 1 ])to = email.utils.parseaddr(msg[ "From" ])[ 1 ]text = data[ 0 ][ 1 ]msg = email.message_from_string(text)got = []for part in msg.walk():if part.get_content_maintype() == 'multipart':continueif part.get('Content-Disposition') is None:continuefilename = part.get_filename()print "file: ", filenameprint "Extention: ", filename.split(".")[ -1 ]if filename.split(".")[ -1 ] not in file_types:continuedata = part.get_payload(decode = True)if not data:continuedate = datetime.datetime.now().strftime("%Y-%m-%d")if not os.path.isdir("CONTENT"):os.mkdir("CONTENT")if not os.path.isdir("CONTENT/" + date):os.mkdir("CONTENT/" + date)ftime = datetime.datetime.now().strftime("%H-%M-%S")new_file = "CONTENT/" + date + "/" + ftime + "_" + filenamef = open(new_file, 'wb')print "Got new file %s from %s" % (new_file, to)got.append(filename.encode("utf-8"))f.write(data)f.close()session.close()session.logout()time.sleep(60)except:print "TARFUN!"

And the problem is that the last print reads garbage:
=?UTF-8?B?0YfQsNGB0YLRjCAxINGC0LXQutGB0YIg0LzQtdGC0L7QtNC40YfQutC4LmRv?=
for example so later checks don't work. On linux it works just fine. For now I tryed to d/e[n]code filename to utf-8. But it did nothing. Thanks in advance.

Answer

If you read the spec that defines the filename field, RFC 2183, section 2.3, it says:

Current [RFC 2045] grammar restricts parameter values (and hence Content-Disposition filenames) to US-ASCII. We recognize the great desirability of allowing arbitrary character sets in filenames, but it is beyond the scope of this document to define the necessary mechanisms. We expect that the basic [RFC 1521] 'value' specification will someday be amended to allow use of non-US-ASCII characters, at which time the same mechanism should be used in the Content-Disposition filename parameter.

There are proposed RFCs to handle this. In particular, it's been suggested that filenames be handled as encoded-words, as defined by RFC 5987, RFC 2047, and RFC 2231. In brief this means either RFC 2047 format:

"=?" charset "?" encoding "?" encoded-text "?="

… or RFC 2231 format:

"=?" charset ["*" language] "?" encoded-text "?="

Some mail agents are already using this functionality, others don't know what to do with it. The email package in Python 2.x is among those that don't know what to do with it. (It's possible that the later version in Python 3.x does, or that it may change in the future, but that won't help you if you want to stick with 2.x.) So, if you want to parse this, you have to do it yourself.

In your example, you've got a filename in RFC 2047 format, with charset UTF-8 (which is usable directly as a Python encoding name), encoding B, which means Base-64, and content 0YfQsNGB0YLRjCAxINGC0LXQutGB0YIg0LzQtdGC0L7QtNC40YfQutC4LmRv. So, you have to base-64 decode that, then UTF-8-decode that, and you get u'часть 1 текст методички.do'.

If you want to do this more generally, you're going to have to write code which tries to interpret each filename in RFC 2231 format if possible, in RFC 2047 format otherwise, and does the appropriate decoding steps. This code isn't trivial enough to write in a StackOverflow answer, but the basic idea is pretty simple, as demonstrated above, so you should be able to write it yourself. You may also want to search PyPI for existing implementations.

https://en.xdnf.cn/q/118704.html

Related Q&A

Blank lines in txt files in Python

I want to write sensor values to a text file with Python. All is working fine but one thing; in the text file, there are blank lines between each value. Its really annoying because I cant put the value…

Python 3 - decode spectroscopy data (Base64, IEEE754)

Im a chemist and working with spectroscopic data that was stored as a list (501 pairs of X,Y data) of Base64-encoded floating point values according to IEEE754.I tried to get an array of X, Y data to w…

Fill new column by following conditions listed in a dictionary [duplicate]

This question already has an answer here:Insert data to new column based on conditions given in dictionary(1 answer)Closed 2 years ago.I have the dictionary specifying the value the row should take if …

Pandas - Splitting dataframe into multiple excel workbooks by column value

Im new to pandas. I have a large excel file, what I’m trying to do is split the data frame after manipulation into multiple excel workbooks. There is more or less 400 vendors and I would like each To …

Create Sections in Python

I am newbie to Python. I have large file with repetitive string through the logsExample:abc def efg gjk abc def efg gjk abc def efg gjk abc def efg gjkExpected Result--------------------Section1-------…

Process CSV files in Python - Zero Imports/No Libraries

I have CSV example like this ID,TASK1,TASK2,QUIZ1,QUIZ2 11061,50,75,50,78 11062,70,80,60,50 11063,60,75,77,79 11064,52,85,50,80 11065,70,85,50,80how do i get the Max, Min and Avg on specific Column? i…

python - debugging: loop for plotting isnt showing the next plot

I need help in debugging. I just cant figure out why its not working as expected.The Code below should read data files (names are stored in all_files) in chunks of 6, arrange them in subplots (i,j indi…

Return formatted string in Python

I have a string:testString = """ My name is %s and I am %s years old and I live in %s"""I have code that finds these three strings that I want to input into testString. Ho…

How to get the greatest number in a list of numbers using multiprocessing

I have a list of random numbers and I would like to get the greatest number using multiprocessing. This is the code I used to generate the list: import random randomlist = [] for i in range(100000000):…

python pandas yahoo stock data error

i am try to pullout intraday aapl stock data by yahoo. but there problem i facing with my program..import pandas as pd import datetime import urllib2 import matplotlib.pyplot as plt get = http://chart…