Processing non-english text

2024/7/7 7:21:08

I have a python file that reads a file given by the user, processes it, and ask questions in flash card format. The program works fine with an english txt file but I encounter errors when trying to process a french file.

When I first encountered the error, I was using the windows command prompt window and running python cards.py. When inputting the french file, I immediately got a UnicodeEncodeError. After digging around, I found that it may have something to do with the fact I was using the cmd window. So I tried using IDLE. I didn't get any errors but I would get weird characters like œ and à and ®.

Upon further research, I found some documentation that instructs to use encoding='insert encoding type' in the open(file) part of my code. After running the program again in IDLE, it seemed to minimize the problem, but I would still get some weird characters. When running it in the cmd, it wouldn't break IMMEDIATELY, but would eventually when it encountered an unknown character.

My question: what do I implement to ensure the program can handle ALL of the chaaracters in the file (given any language) and why does IDLE and the command prompt handle the file differently?

EDIT: I forgot to mention that I ended up using utf-8 which gave the results I described.

Answer

It's common question. Seems that you're using cmd which doesn't support unicode, so error occurs during translation of output to the encoding, which your cmd runs. And as unicode has a wider charset, than encoding used in cmd, it gives an error

IDLE is built ontop of tkinter's Text widget, which perfectly supports Python strings in unicode.

And, finally, when you specify a file you'd like to open, the open function assumes that it's in platform default (per locale.getpreferredencoding()). So if your file encoding differs, you should exactly mention it in keyword arg encoding to open func.

https://en.xdnf.cn/q/120255.html

Related Q&A

Downloading all zip files from url

I need to download all the zip files from the url: https://www.ercot.com

sql to query set

I have 2 tables:puzz_meeting_candidats :- id, canceled, candidat_id, meeting_id puzz_meeting :- id, ClientI have a query follow: SELECT U1.`candidat_id` AS Col1 FROM `puzz_meeting_candidats` U1 INN…

Google App Engine, best practice to schedule code execution [closed]

Closed. This question is opinion-based. It is not currently accepting answers.Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.Clo…

delete rows by date and add file name column for multiple csv

I have multiple "," delimited csv files with recorded water pipe pressure sensor data, already sorted by date older-newer. For all original files, the first column always contains dates forma…

X = Y = Lists vs Numbers [duplicate]

This question already has answers here:Immutable vs Mutable types(20 answers)How do I clone a list so that it doesnt change unexpectedly after assignment?(24 answers)Closed 4 years ago.In python : I h…

Python data text file grades program

Looking for help with my program. There is a text file with 5 first and last names and a number grade corresponding to each person. The task is to create a user name and change the number grade to a le…

how to fill NA with mean only for 2 or less consequective values of NA

I am new to python. please help me how I should proceed. The following dataframe contains large blocks of NaNs. # Fill the NAs with mean only for 2 or less consecutive values of NAs. # Refer to the d…

Build a new dictionary from the keys of one dictionary and the values of another dictionary

I have two dictionaries:dict_1 = ({a:1, b:2,c:3}) dict_2 = ({x:4,y:5,z:6})I want to take the keys from dict_1 and values from dict_2 and make a new dict_3dict_3 = ({a:4,b:5,c:6})

Python 2.7 - clean syntax for lvalue modification

It is very common to have struct-like types that are not expected to be modified by distant copyholders.A string is a basic example, but thats an easy case because its excusably immutable -- Python is …

Python - Global name date is not defined [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…