Find delimiter in txt to convert to csv using Python

2024/10/2 6:28:24

I have to convert some txt files to csv (and make some operation during the conversion).

I use csv.Sniffer() class to detect wich delimiter is used in the txt

This code

with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:dialect = csv.Sniffer().sniff(f1.read(1024)) #### detect delimiters  f1.seek(0)r=csv.reader(f1, delimiter=dialect )writer = csv.writer(f2,delimiter=';')

return: Error: Could not determine delimiter

This work

with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:#dialect = csv.Sniffer().sniff(f1.read(1024)) #### detect delimiters  #f1.seek(0)r=csv.reader(f1, delimiter='\t' )writer = csv.writer(f2,delimiter=';')

or

with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:#dialect = csv.Sniffer().sniff(f1.read(1024)) #### detect delimiters  #f1.seek(0)r=csv.reader(f1, dialect="excel-tab")writer = csv.writer(f2,delimiter=';')

this is a txt row example (10 records delimited by Tab)

166 14908941    sa_s    NOVA i  7.05    DEa 7.17    Ncava - Deo mo  7161    4,97

why csv.Sniffer() class doesn't work?

The bug was read only 1024 byte to parse the entire txt(maybe this is not enough to detect the delimiter). Now this code works without no other edit:

with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:dialect = csv.Sniffer().sniff(f1.read()) #### error with dialect = csv.Sniffer().sniff(f1.read(1024))  f1.seek(0)r=csv.reader(f1, delimiter=dialect )writer = csv.writer(f2,delimiter=';')
Answer

You have to use dialect.delimiter instead of just dialect because what is returned is of type class Dialect and you need its attribute Dialect.delimiter

rows=csv.reader(f1, delimiter=dialect.delimiter)

Modified code will be as below

import csvfilename_input = 'filein.txt'
filename_output = 'fileout.csv'
with open(filename_input, 'r') as f1, open(filename_output, 'wb') as f2:dialect = csv.Sniffer().sniff(f1.read(1024), "\t") #### detect delimitersf1.seek(0)print(dialect.delimiter)rows=csv.reader(f1, delimiter=dialect.delimiter)writer = csv.writer(f2,delimiter=';')writer.writerows(rows)

Output:

C:\pyp>python.exe txttocsv.py
,
C:\pyp>

Also note that from doc:

sniff(sample, delimiters=None)

Analyze the given sample and return a Dialect subclass reflectingthe parameters found. If the optional delimiters parameter is given,it is interpreted as a string containing possible valid delimitercharacters.

Hence if the delimiter that you want to find in your text file is something like # instead of , or ; then you should mention that in sniff function as second parameter like this:

dialect = csv.Sniffer().sniff(f1.read(1024), '#') 

Update: For reading whole file you will need

dialect = csv.Sniffer().sniff(f1.read()) 
https://en.xdnf.cn/q/70884.html

Related Q&A

Assert mocked function called with json string in python

Writing some unit tests in python and using MagicMock to mock out a method that accepts a JSON string as input. In my unit test, I want to assert that it is called with given arguments, however I run i…

read certificate(.crt) and key(.key) file in python

So im using the JIRA-Python module to connect to my companys instance on JIRA and it requires me to pass the certificate and key for this. However using the OpenSSL module,im unable to read my local ce…

Admin FileField current url incorrect

In the Django admin, wherever I have a FileField, there is a "currently" box on the edit page, with a hyperlink to the current file. However, this link is appended to the current page url, an…

Difference between generator expression and generator function

Is there any difference — performance or otherwise — between generator expressions and generator functions?In [1]: def f():...: yield from range(4)...:In [2]: def g():...: return (i for i in…

Django performance testing suite thatll report on metrics (db queries etc.)

I have a complex Django web application that has many person-years of work put into it. It might need optimisation sometime. There are several common operation/flows that I could script with (say) djan…

dev_appserver.py Opens a Text File, Does Not Deploy

It works fine on my other computer, but after setting up Google App Engine and creating the main.py and app.yaml files, I run dev_appserver.py app.yaml in Windows command prompt and instead of deployin…

How to pass a list from a view to template in django

I am trying pass to list from a view to template in Django.In my file wiew.py I define the view named hour # This Python file uses the following encoding: utf-8from django.shortcuts import render from …

Probing/sampling/interpolating VTK data using python TVTK or MayaVi

I would like to visualise a VTK data file (OpenFOAM output) using python. The plot I would like to make is a 1-d line plot of a quantity between two endpoints. To do so, the unstructured data should be…

Make Sphinx generate RST class documentation from pydoc

Im currently migrating all existing (incomplete) documentation to Sphinx.The problem is that the documentation uses Python docstrings (the module is written in C, but it probably does not matter) and t…

inspect.getfile () vs inspect.getsourcefile()

I was just going through the inspect module docs.What exactly is the difference between:inspect.getfile()andinspect.getsourcefile()I get exactly the same file path (of the module) for both.