Match unescaped quotes in quoted csv

2024/11/15 10:41:51

I've looked at several of the Stack Overflow posts with similar titles, and none of the accepted answers have done the trick for me.

I have a CSV file where each "cell" of data is delimited by a comma and is quoted (including numbers). Each line ends with a new line character.

Some text "cells" have quotation marks in them, and I want to use regex to find these, so that I can escape them properly.

Example line:

"0","0.23432","234.232342","data here dsfsd hfsdf","3/1/2016",,"etc","E 60"","AD"8"\n

I want to match just the " in E 60" and in AD"8, but not any of the other ".

What is a (preferably Python-friendly) regular expression that I can use to do this?

Answer

EDIT: Updated with regex from @sundance to avoid beginning of line and newline.

You could try substituting only quotes that aren't next to a comma, start of line, or newline:

import renewline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)
https://en.xdnf.cn/q/71959.html

Related Q&A

Creating RDF file using csv file as input

I need to convert a csv file to rdf with rdflib, I already have the code that reads the csv but I do not know how to convert it to rdf.I have the following code:import csv from rdflib.graph import Grap…

Explicit vertex position in python graph-tool

I am using python graph-tool. To draw graphs, it uses graph_draw function. I want to send vertex positions explicitly to dot engine. It turns out that I can pass a property map named pos. I tried defin…

Add jar to pyspark when using notebook

Im trying the mongodb hadoop integration with spark but cant figure out how to make the jars accessible to an IPython notebook.Here what Im trying to do:# set up parameters for reading from MongoDB via…

how do I create a python list with a negative index

Im new to python and need to create a list with a negative index but have not been successful so far.Im using this code:a = [] for i in xrange( -20, 0, -1 ):a[i] = -(i)log.info(a[{i}]={v}.format(i=i, v…

Select subset of Data Frame rows based on a list in Pandas

I have a data frame df1 and list x:In [22] : import pandas as pd In [23]: df1 = pd.DataFrame({C: range(5), "B":range(10,20,2), "A":list(abcde)}) In [24]: df1 Out[24]:A B C 0 a …

convert csv to json (nested objects)

I am new to python, and I am having to convert a csv file to json in following format:CSV File :firstname, lastname, email, customerid, dateadded, customerstatus john, doe, [email protected], 124,26/11…

How can I read exactly one response chunk with pythons http.client?

Using http.client in Python 3.3+ (or any other builtin python HTTP client library), how can I read a chunked HTTP response exactly one HTTP chunk at a time?Im extending an existing test fixture (writt…

ValueError: cannot reindex from a duplicate axis in groupby Pandas

My dataframe looks like this:SKU # GRP CATG PRD 0 54995 9404000 4040 99999 1 54999 9404000 4040 99999 2 55037 9404000 4040 1556894 3 55148 9404000 4040 1556894 4 55254 94…

How to calculate class weights of a Pandas DataFrame for Keras?

Im tryingprint(Y) print(Y.shape)class_weights = compute_class_weight(balanced,np.unique(Y),Y) print(class_weights)But this gives me an error:ValueError: classes should include all valid labels that can…

How to change the layout of a Gtk application on fullscreen?

Im developing another image viewer using Python and Gtk and the viewer is currently very simple: it consists of a GtkWindow with a GtkTreeView on the left side displaying the list of my images, and a G…