Unexpected behavior of universal newline mode with StringIO and csv modules

2024/9/20 9:29:04

Consider the following (Python 3.2 under Windows):

>>> import io
>>> import csv
>>> output = io.StringIO()         # default parameter newline=None
>>> csvdata = [1, 'a', 'Whoa!\nNewlines!']
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
25
>>> output.getvalue()
'1,"a","Whoa!\nNewlines!"\r\n'

Why is there a single \n - shouldn't it have been converted to \r\n since universal newlines mode is enabled?

With this enabled, on input, the lines endings \n, \r, or \r\nare translated to \n before being returned to the caller.Conversely, on output, \n is translated to the system default lineseparator, os.linesep.

Answer

The "single" \n occurs as a data character inside the third field. Consequently that field is quoted so that a csv reader will treat it as part of the data. It is NOT a "line terminator" (should be called a row separator) or part thereof. To get a better appreciation of the quoting, remove the quoting=csv.QUOTE_NONNUMERIC.

The \r\n is produced because csv terminates rows with the dialect.lineterminator whose default is \r\n. In other words, the "universal newlines" setting is ignored.

Update

The 2.7 and 3.2 docs for io.StringIO are virtually identical as far as the newline arg is concerned.

The newline argument works like that of TextIOWrapper. The default isto do no newline translation.

We'll examine the first sentence below. The second sentence is true for output, depending on your interpretation of "default" and "newline translation".

TextIOWrapper docs:

newline can be None, '', '\n', '\r', or '\r\n'. It controls thehandling of line endings. If it is None, universal newlines isenabled. With this enabled, on input, the lines endings '\n', '\r', or'\r\n' are translated to '\n' before being returned to the caller.Conversely, on output, '\n' is translated to the system default lineseparator, os.linesep. If newline is any other of its legal values,that newline becomes the newline when the file is read and it isreturned untranslated. On output, '\n' is converted to the newline.

Python 3.2 on Windows:

>>> from io import StringIO as S
>>> import os
>>> print(repr(os.linesep))
'\r\n'
>>> ss = [S()] + [S(newline=nl) for nl in (None, '', '\n', '\r', '\r\n')]
>>> for x, s in enumerate(ss):
...     m = s.write('foo\nbar\rzot\r\n')
...     v = s.getvalue()
...     print(x, m, len(v), repr(v))
...
0 13 13 'foo\nbar\rzot\r\n'
1 13 12 'foo\nbar\nzot\n'
2 13 13 'foo\nbar\rzot\r\n'
3 13 13 'foo\nbar\rzot\r\n'
4 13 13 'foo\rbar\rzot\r\r'
5 13 15 'foo\r\nbar\rzot\r\r\n'
>>>

Line 0 shows that the "default" that you get with no newline arg involves no translation of \n (or any other character). It is certainly NOT converting '\n' to os.linesep

Line 1 shows that what you get with newline=None (should be the same as line 0, shouldn't it??) is in effect INPUT universal newlines translation -- bizarre!

Line 2: newline='' does no change, like line 0. It is certainly NOT converting '\n' to ''.

Lines 3, 4, and 5: as the docs say, '\n' is converted to the value of the newline arg.

The equivalent Python 2.X code produces equivalent results with Python 2.7.2.

Update 2 For consistency with built-in open(), the default should be os.linesep, as documented. To get the no-translation-on-output behaviour, use newline=''. Note: the open() docs are much clearer. I'll submit a bug report tomorrow.

https://en.xdnf.cn/q/72365.html

Related Q&A

logger chain in python

Im writing python package/module and would like the logging messages mention what module/class/function they come from. I.e. if I run this code:import mymodule.utils.worker as workerw = worker.Worker()…

How to make data to be shown in tabular form in discord.py?

Hi I am creating a bot that makes points table/leaderboard , below is the code which works really nice. def check(ctx):return lambda m: m.author == ctx.author and m.channel == ctx.channelasync def get_…

Getting Python version using Go

Im trying to get my Python version using Go:import ("log""os/exec""strings" )func verifyPythonVersion() {_, err := exec.LookPath("python")if err != nil {log.Fata…

Python shutil.copytree() is there away to track the status of the copying

I have a lot of raster files (600+) in directories that I need copy into a new location (including their directory structure). Is there a way to track the status of the copying using shutil.copytree()?…

Py2exe error: [Errno 2] No such file or directory

C:\Users\Shalia\Desktop\accuadmin>python setup_py2exe.py py2exe running py2exe10 missing Modules------------------ ? PIL._imagingagg imported from PIL.ImageDraw ? PyQt4 …

pandas rolling window mean in the future

I would like to use the pandas.DataFrame.rolling method on a data frame with datetime to aggregate future values. It looks it can be done only in the past, is it correct?

When should I use type checking (if ever) in Python?

Im starting to learn Python and as a primarily Java developer the biggest issue I am having is understanding when and when not to use type checking. Most people seem to be saying that Python code shoul…

Kartograph python script generates SVG with incorrect lat/long coords

I have modified this question to reflect some progress on discovering the problem. I am using the following python code to generate an SVG of the continental USA. The shapefile is irrelevant, as the …

Python multiprocessing blocks indefinately in waiter.acquire()

Can someone explain why this code blocks and cannot complete?Ive followed a couple of examples for multiprocessing and Ive writting some very similar code that does not get blocked. But, obviously, I…

What is the best way to control Twisteds reactor so that it is nonblocking?

Instead of running reactor.run(), Id like to call something else (I dunno, like reactor.runOnce() or something) occasionally while maintaining my own main loop. Is there a best-practice for this with …