Replacing punctuation in a data frame based on punctuation list [duplicate]

2024/9/28 7:18:48

Using Canopy and Pandas, I have data frame a which is defined by:

a=pd.read_csv('text.txt')df=pd.DataFrame(a)df.columns=["test"]

test.txt is a single column file that contains a list of string that contains text, numerical and punctuation.

Assuming df looks like:


test

%hgh&12

abc123!!!

porkyfries


I want my results to be:


test

hgh12

abc123

porkyfries


Effort so far:

from string import punctuation /-- import punctuation list from python itselfa=pd.read_csv('text.txt')df=pd.DataFrame(a)df.columns=["test"] /-- define the dataframefor p in list(punctuation):...:     df2=df.med.str.replace(p,'')...:     df2=pd.DataFrame(df2);...:     df2

The command above basically just returns me with the same data set.Appreciate any leads.

Edit: Reason why I am using Pandas is because data is huge, spanning to bout 1M rows, and future usage of the coding will be applied to list that go up to 30M rows. Long story short, I need to clean data in a very efficient manner for big data sets.

Answer

For removing punctuation from a text column in your dataframme:

In:

import re
import string
rem = string.punctuation
pattern = r"[{}]".format(rem)pattern

Out:

'[!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~]'

In:

df = pd.DataFrame({'text':['book...regh', 'book...', 'boo,', 'book. ', 'ball, ', 'ballnroll"', '"rope"', 'rick % ']})
df

Out:

        text
0  book...regh
1      book...
2         boo,
3       book. 
4       ball, 
5   ballnroll"
6       "rope"
7      rick % 

In:

df['text'] = df['text'].str.replace(pattern, '')
df

You can replace the pattern with your desired character. Ex - replace(pattern, '$')

Out:

        text
0   bookregh
1       book
2        boo
3      book 
4      ball 
5  ballnroll
6       rope
7     rick  
https://en.xdnf.cn/q/71367.html

Related Q&A

How to import one submodule from different submodule? [duplicate]

This question already has answers here:Relative imports for the billionth time(14 answers)Closed 6 years ago.My project has the following structure:DSTC/st/__init__.pya.pyg.pytb.pydstc.pyHere is a.py i…

How to add dimension to a tensor using Tensorflow

I have method reformat in which using numpy I convert a label(256,) to label(256,2) shape. Now I want to do same operation on a Tensor with shape (256,)My code looks like this (num_labels=2) :--def ref…

Down arrow symbol in matplotlib

I would like to create a plot where some of the points have a downward pointing arrow (see image below). In Astronomy this illustrates that the true value is actually lower than whats measured.Note tha…

Overwrite the previous print value in python?

How can i overwrite the previous "print" value in python?print "hello" print "dude" print "bye"It will output:hello dude byeBut i want to overwrite the value.In…

pyQt4 - How to select table rows and disable editing cells

I create a QTableWidget with:self.table = QtGui.QTableWidget() self.table.setObjectName(table) self.table.setSelectionBehavior(QtGui.QAbstractItemView.SelectRows) verticalLayout.addWidget(self.table)wi…

Error when checking input: expected dense_input to have shape (21,) but got array with shape (1,)

How to fix the input array to meet the input shape?I tried to transpose the input array, as described here, but an error is the same.ValueError: Error when checking input: expected dense_input to have…

Sort order when loading related objects using selectinload in SQLAlchemy

Is there a way to specify the sort order when loading related objects using the selectinload option in SQLAlchemy?My SQLAlchemy version: 1.2.10 My python version: 3.6.6

How to implement autovivification for nested dictionary ONLY when assigning values?

TL;DR How can I get superkeys to be autovivified in a Python dict when assigning values to subkeys, without also getting them autovivified when checking for subkeys?Background: Normally in Python, se…

How can I iterate across the photos on my connected iPhone from Windows 7 in Python?

When I connect my iPhone to my Windows 7 system, the Windows Explorer opens a Virtual Folder to the DCIM content. I can access the shell library interface via Pywin32 (218) as mentioned here: Can I use…

Why doesnt the python slice syntax wrap around from negative to positive indices?

I noticed, given l = [1,2,3], that l[-1:] returns [3] as expected, but that l[-1:0] returns [], very much unlike what I expected. I then tried [-1:1], which I expected to return [3,1], but it also retu…