Remove non-ASCII characters from string columns in pandas

2024/11/13 9:32:49

I have panda dataframe with multiple columns which mixed with values and unwanted characters.

columnA        columnB    columnC        ColumnD
\x00A\X00B     NULL       \x00C\x00D        123
\x00E\X00F     NULL       NULL              456

what I'd like to do is to make this dataframe as below.

columnA  columnB  columnC   ColumnD
AB        NULL       CD        123
EF        NULL       NULL      456

With my codes below, I can remove '\x00' from columnA but columnC is tricky as it is mixed with NULL in certain row.

col_names = cols_to_clean
fixer = dict.fromkeys([0x00], u'')
for i in col_names:
if df[i].isnull().any() == False:if df[i].dtype != np.int64:df[i] = df[i].map(lambda x: x.translate(fixer))

Is there any efficient way to remove unwanted characters from columnC?

Answer

In general, to remove non-ascii characters, use str.encode with errors='ignore':

df['col'] = df['col'].str.encode('ascii', 'ignore').str.decode('ascii')

To perform this on multiple string columns, use

u = df.select_dtypes(object)
df[u.columns] = u.apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))

Although that still won't handle the null characters in your columns. For that, you replace them using regex:

df2 = df.replace(r'\W+', '', regex=True)
https://en.xdnf.cn/q/72497.html

Related Q&A

Open source Twitter clone (in Ruby/Python) [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

What is the best way to connect to a Sybase database from Python?

I am trying to retrieve data in a Sybase data base from Python and I was wondering which would be the best way to do it. I found this module but may be you have some other suggestions: http://python-sy…

How to get N random integer numbers whose sum is equal to M

I want to make a list of N random INTEGER numbers whose sum is equal to M number.I have used numpy and dirichlet function in Python, but this generate double random number array, I would like to genera…

Why sqlalchemy declarative base object has no attribute query?

I created declarative table. from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, String from sqlalchemy.dialects.postgresql import UUID import uuidBase = declarative_…

Django ModelForm not saving data

Ive tried solutions from the following posts: Saving data from ModelForm : Didnt workModelForm data not saving django : Didnt work. Im trying to save data from a ModelForm into the model. models.py:cla…

When is it appropriate to use sample_weights in keras?

According to this question, I learnt that class_weight in keras is applying a weighted loss during training, and sample_weight is doing something sample-wise if I dont have equal confidence in all the …

Django South - turning a null=True field into a null=False field

My question is, what is the best practice for turning a null=True field into a null=False field using Django South. Specifically, Im working with a ForeignKey.

Apostrophes are printing out as \x80\x99

import requests from bs4 import BeautifulSoup import resource_url = requests.get(http://www.nytimes.com/pages/business/index.html) div_classes = {class :[ledeStory , story]} title_tags = [h2,h3,h4,h5,h…

Have Sphinx replace docstring text

I am documenting code in Sphinx that resembles this: class ParentClass(object):def __init__(self):passdef generic_fun(self):"""Call this function using /run/ParentClass/generic_fun()&quo…

exit is not a keyword in Python, but no error occurs while using it

I learn that exit is not a keyword in Python by,import keyword print(exit in keyword.kwlist) # Output: FalseBut there is no reminder of NameError: name exit is not defined while using it. The outpu…