How to find accented characters in a string in Python?

2024/10/4 19:34:18

I have a file with sentences, some of which are in Spanish and contain accented letters (e.g. é) or special characters (e.g. ¿). I have to be able to search for these characters in the sentence so I can determine if the sentence is in Spanish or English.

I've tried my best to accomplish this, but have had no luck in getting it right. Below is one of the solutions I tried, but clearly gave the wrong answer.

sentence = ¿Qué tipo es el? #in str format, received from standard open file method
sentence = sentence.decode('latin-1')
print 'é'.decode('latin-1') in sentence
>>> False

I've also tried using codecs.open(.., .., 'latin-1') to read in the file instead, but that didn't help. Then I tried u'é'.encode('latin-1'), and that didn't work.

I'm out of ideas here, any suggestions?

@icktoofay provided the solution. I ended up keeping the decoding of the file (using latin-1), but then using the Python unicode for the characters (u'é'). This required me to set the Python unicode encoding at the top of the script. The final step was to use the unicodedata.normalize method to normalize both strings, then compare accordingly. Thank you guys for the prompt and great support.

Answer

Use unicodedata.normalize on the string before checking.

Explanation

Unicode offers multiple forms to create some characters. For example, á could be represented with a single character, á, or two characters: a, then 'put a ´ on top of that'. Normalizing the string will force it to one or the other of the representations. (which representation it normalizes to depends on what you pass as the form parameter)

https://en.xdnf.cn/q/70573.html

Related Q&A

Import error with PyQt5 : undefined symbol:_ZNSt12out_of_rangeC1EPKc,versionQt_5

I had watched a video on YouTube about making your own web browser using PyQt5. Link to video: https://youtu.be/z-5bZ8EoKu4, I found it interesting and decided to try it out on my system. Please note t…

Python MySQL module

Im developing a web application that needs to interface with a MySQL database, and I cant seem to find any really good modules out there for Python.Im specifically looking for fast module, capable of h…

How to calculate correlation coefficients using sklearn CCA module?

I need to measure similarity between feature vectors using CCA module. I saw sklearn has a good CCA module available: https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.h…

cant open shape file with GeoPandas

I dont seem to be able to open the zip3.zip shape file I download from (http://www.vdstech.com/usa-data.aspx)Here is my code:import geopandas as gpd data = gpd.read_file("data/zip3.shp")this …

How to fix the error :range object is not callable in python3.6 [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

pyspark: Save schemaRDD as json file

I am looking for a way to export data from Apache Spark to various other tools in JSON format. I presume there must be a really straightforward way to do it.Example: I have the following JSON file jfil…

How to start a background thread when django server is up?

TL;DR In my django project, where do I put my "start-a-thread" code to make a thread run as soon as the django server is up?First of all, Happy New Year Everyone! This is a question from a n…

Can I use SQLAlchemy relationships in ORM event callbacks? Always get None

I have a User model that resembles the following:class User(db.Model):id = db.Column(db.BigInteger, primary_key=True)account_id = db.Column(db.BigInteger, db.ForeignKey(account.id))account = db.relatio…

Elegant way to transform a list of dict into a dict of dicts

I have a list of dictionaries like in this example:listofdict = [{name: Foo, two: Baz, one: Bar}, {name: FooFoo, two: BazBaz, one: BarBar}]I know that name exists in each dictionary (as well as the oth…

sharpen image to detect edges/lines in a stamped X object on paper

Im using python & opencv. My goal is to detect "X" shaped pieces in an image taken with a raspberry pi camera. The project is that we have pre-printed tic-tac-toe boards, and must image t…