Unexpected behavior of universal newline mode with StringIO and csv modules

2024/9/20 9:29:04

Consider the following (Python 3.2 under Windows):

>>> import io
>>> import csv
>>> output = io.StringIO()         # default parameter newline=None
>>> csvdata = [1, 'a', 'Whoa!\nNewlines!']
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
>>> output.getvalue()

Why is there a single \n - shouldn't it have been converted to \r\n since universal newlines mode is enabled?

With this enabled, on input, the lines endings \n, \r, or \r\nare translated to \n before being returned to the caller.Conversely, on output, \n is translated to the system default lineseparator, os.linesep.


The "single" \n occurs as a data character inside the third field. Consequently that field is quoted so that a csv reader will treat it as part of the data. It is NOT a "line terminator" (should be called a row separator) or part thereof. To get a better appreciation of the quoting, remove the quoting=csv.QUOTE_NONNUMERIC.

The \r\n is produced because csv terminates rows with the dialect.lineterminator whose default is \r\n. In other words, the "universal newlines" setting is ignored.


The 2.7 and 3.2 docs for io.StringIO are virtually identical as far as the newline arg is concerned.

The newline argument works like that of TextIOWrapper. The default isto do no newline translation.

We'll examine the first sentence below. The second sentence is true for output, depending on your interpretation of "default" and "newline translation".

TextIOWrapper docs:

newline can be None, '', '\n', '\r', or '\r\n'. It controls thehandling of line endings. If it is None, universal newlines isenabled. With this enabled, on input, the lines endings '\n', '\r', or'\r\n' are translated to '\n' before being returned to the caller.Conversely, on output, '\n' is translated to the system default lineseparator, os.linesep. If newline is any other of its legal values,that newline becomes the newline when the file is read and it isreturned untranslated. On output, '\n' is converted to the newline.

Python 3.2 on Windows:

>>> from io import StringIO as S
>>> import os
>>> print(repr(os.linesep))
>>> ss = [S()] + [S(newline=nl) for nl in (None, '', '\n', '\r', '\r\n')]
>>> for x, s in enumerate(ss):
...     m = s.write('foo\nbar\rzot\r\n')
...     v = s.getvalue()
...     print(x, m, len(v), repr(v))
0 13 13 'foo\nbar\rzot\r\n'
1 13 12 'foo\nbar\nzot\n'
2 13 13 'foo\nbar\rzot\r\n'
3 13 13 'foo\nbar\rzot\r\n'
4 13 13 'foo\rbar\rzot\r\r'
5 13 15 'foo\r\nbar\rzot\r\r\n'

Line 0 shows that the "default" that you get with no newline arg involves no translation of \n (or any other character). It is certainly NOT converting '\n' to os.linesep

Line 1 shows that what you get with newline=None (should be the same as line 0, shouldn't it??) is in effect INPUT universal newlines translation -- bizarre!

Line 2: newline='' does no change, like line 0. It is certainly NOT converting '\n' to ''.

Lines 3, 4, and 5: as the docs say, '\n' is converted to the value of the newline arg.

The equivalent Python 2.X code produces equivalent results with Python 2.7.2.

Update 2 For consistency with built-in open(), the default should be os.linesep, as documented. To get the no-translation-on-output behaviour, use newline=''. Note: the open() docs are much clearer. I'll submit a bug report tomorrow.


