how to write a unicode csv in Python 2.7

2024/9/8 10:21:11

I want to write data to files where a row from a CSV should look like this list (directly from the Python console):

row = ['\xef\xbb\xbft_11651497', 'http://kozbeszerzes.ceu.hu/entity/t/11651497.xml', "Szabolcs Mag '98 Kft.", 'ny\xc3\xadregyh\xc3\xa1za', 'ny\xc3\xadregyh\xc3\xa1za', '4400', 't\xc3\xbcnde utca 20.', 47.935175, 21.744975, u'Ny\xedregyh\xe1za', u'Borb\xe1nya', u'Szabolcs-Szatm\xe1r-Bereg', u'Ny\xedregyh\xe1zai', u'20', u'T\xfcnde utca', u'Magyarorsz\xe1g', u'4405']

Py2k does not do Unicode, but I had a UnicodeWriter wrapper:

import cStringIO, codecs
class UnicodeWriter:"""A CSV writer which will write rows to CSV file "f",which is encoded in the given encoding."""def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):# Redirect output to a queueself.queue = cStringIO.StringIO()self.writer = csv.writer(self.queue, dialect=dialect, **kwds)self.stream = fself.encoder = codecs.getincrementalencoder(encoding)()def writerow(self, row):self.writer.writerow([unicode(s).encode("utf-8") for s in row])# Fetch UTF-8 output from the queue ...data = self.queue.getvalue()data = data.decode("utf-8")# ... and reencode it into the target encodingdata = self.encoder.encode(data)# write to the target streamself.stream.write(data)# empty queueself.queue.truncate(0)def writerows(self, rows):for row in rows:self.writerow(row)

However, these lines still produce the dreaded encoding error message below:

f.write(codecs.BOM_UTF8)
writer = UnicodeWriter(f)
writer.writerow(row)UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)

What is there to do? Thanks!

Answer

You are passing bytestrings containing non-ASCII data in, and these are being decoded to Unicode using the default codec at this line:

self.writer.writerow([unicode(s).encode("utf-8") for s in row])

unicode(bytestring) with data that cannot be decoded as ASCII fails:

>>> unicode('\xef\xbb\xbft_11651497')
Traceback (most recent call last):File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

Decode the data to Unicode before passing it to the writer:

row = [v.decode('utf8') if isinstance(v, str) else v for v in row]

This assumes that your bytestring values contain UTF-8 data instead. If you have a mix of encodings, try to decode to Unicode at the point of origin; where your program first sourced the data. You really want to do so anyway, regardless of where the data came from or if it already was encoded to UTF-8 as well.

https://en.xdnf.cn/q/72875.html

Related Q&A

Terminating QThread gracefully on QDialog reject()

I have a QDialog which creates a QThread to do some work while keeping the UI responsive, based on the structure given here: How To Really, Truly Use QThreads; The Full Explanation. However, if reject(…

Python descriptors with old-style classes

I tried to google something about it. Why do non-data descriptors work with old-style classes?Docs say that they should not: "Note that descriptors are only invoked for new style objects or class…

Decrypting a file to a stream and reading the stream into pandas (hdf or stata)

Overview of what Im trying to do. I have encrypted versions of files that I need to read into pandas. For a couple of reasons it is much better to decrypt into a stream rather than a file, so thats m…

How to replace accents in a column of a pandas dataframe

I have a dataframe dataSwiss which contains the information Swiss municipalities. I want to replace the letter with accents with normal letter.This is what I am doing:dataSwiss[Municipality] = dataSwis…

Comparison of multi-threading models in Julia =1.3 and Python 3.x

I would like to understand, from the user point of view, the differences in multithreading programming models between Julia >= 1.3 and Python 3.Is there one that is more efficient than the other (in…

How to do multihop ssh with fabric

I have a nat and it has various server So from my local server I want to go to nat and then from nat i have to ssh to other machinesLocalNAT(abcuser@publicIP with key 1)server1(xyzuser@localIP with key…

Python - Converting CSV to Objects - Code Design

I have a small script were using to read in a CSV file containing employees, and perform some basic manipulations on that data.We read in the data (import_gd_dump), and create an Employees object, cont…

Python multithreading - memory not released when ran using While statement

I built a scraper (worker) launched XX times through multithreading (via Jupyter Notebook, python 2.7, anaconda). Script is of the following format, as described on python.org:def worker():while True:i…

Delete files that are older than 7 days

I have seen some posts to delete all the files (not folders) in a specific folder, but I simply dont understand them.I need to use a UNC path and delete all the files that are older than 7 days.Mypath …

Doctests: How to suppress/ignore output?

The doctest of the following (nonsense) Python module fails:""" >>> L = [] >>> if True: ... append_to(L) # XXX >>> L [1] """def append_to(L):…