Question 1

I have a dataframe dataSwiss which contains the information Swiss municipalities. I want to replace the letter with accents with normal letter.

This is what I am doing:

dataSwiss['Municipality'] = dataSwiss['Municipality'].str.encode('utf-8')
dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")

but I get the following error:

----> 2 dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

data looks like:

dataSwiss.Municipality
0               Zürich
1               Zürich
2               Zürich
3               Zürich
4               Zürich
5               Zürich
6               Zürich
7               Zürich

I found the solution

s = dataSwiss['Municipality']
res = s.str.decode('utf-8')
res = res.str.replace(u"é", "e")

Question 2

This is one way. You can convert to byte literal first before decoding to utf-8.

s = pd.Series(['hello', 'héllo', 'Zürich', 'Zurich'])res = s.str.normalize('NFKD')\.str.encode('ascii', errors='ignore')\.str.decode('utf-8')print(res)0     hello
1     hello
2    Zurich
3    Zurich
dtype: object

pd.Series.str.normalize uses unicodedata module. As per the docs:

The normal form KD (NFKD) will apply the compatibility decomposition,i.e. replace all compatibility characters with their equivalents.

How to replace accents in a column of a pandas dataframe

Related Q&A

Comparison of multi-threading models in Julia =1.3 and Python 3.x

How to do multihop ssh with fabric

Python - Converting CSV to Objects - Code Design

Python multithreading - memory not released when ran using While statement

Delete files that are older than 7 days

Doctests: How to suppress/ignore output?

Matplotlib not showing xlabel in top two subplots

SQLAlchemy NOT exists on subselect?

What is the correct way to obtain explanations for predictions using Shap?

value error when using numpy.savetxt