Is it possible to sniff the Character encoding?

2024/5/19 17:15:55

I have a webpage that accepts CSV files. These files may be created in a variety of places. (I think) there is no way to specify the encoding in a CSV file - so I can not reliably treat all of them as utf-8 or any other encoding.

Is there a way to intelligently guess the encoding of the CSV I am getting? I am working with Python, but willing to work with language agnostic methods too.

Answer

There is no correct way to determine the encoding of a file by looking at only the file itself, but you can use some heuristics-based solution, eg.: chardet

https://en.xdnf.cn/q/73274.html

Related Q&A

numpy.empty giving nonempty array

When I create an empty numpy array using foo = np.empty(1) the resulting array contains a float64:>>> foo = np.empty(1) >>> foo array([ 0.]) >>> type(foo[0]) <type numpy.f…

Accessing password protected url from python script

In python, I want to send a request to a url which will return some information to me. The problem is if I try to access the url from the browser, a popup box appears and asks for a username and passwo…

Solving multiple linear sparse matrix equations: numpy.linalg.solve vs. scipy.sparse.linalg.spsolve

I have to solve a large amount of linear matrix equations of the type "Ax=B" for x where A is a sparse matrix with mainly the main diagonal populated and B is a vector. My first approach was …

I want to return html in a flask route [duplicate]

This question already has answers here:Python Flask Render Text from Variable like render_template(4 answers)Closed 6 years ago.Instead of using send_static_file, I want to use something like html(<…

Why doesnt cv2 dilate actually affect my image?

So, Im generating a binary (well, really gray scale, 8bit, used as binary) image with python and opencv2, writing a small number of polygons to the image, and then dilating the image using a kernel. Ho…

How to plot text clusters?

I have started to learn clustering with Python and sklearn library. I have wrote a simple code for clustering text data. My goal is to find groups / clusters of similar sentences. I have tried to plot…

Selenium - Unresponsive Script Error (Firefox)

This question has been asked before, but the answer given does not seem to work for me. The problem is, when opening a page using Selenium, I get numerous "Unresponsive Script" pop ups, refe…

Fail to validate URL in Facebook webhook subscription with python flask on the back end and ssl

Im trying to start using new messenger platform from FB. So i have server with name (i.e.) www.mysite.com I got a valid SSL certificate for that domain and apache is setup correctly - all good.I have …

What is a proper way to test SQLAlchemy code that throw IntegrityError?

I have read this Q&A, and already try to catch exception on my code that raise an IntegrityError exception, this way :self.assertRaises(IntegrityError, db.session.commit())But somehow my unit test …

What are the different options for social authentication on Appengine - how do they compare?

[This question is intended as a means to both capture my findings and sanity check them - Ill put up my answer toute suite and see what other answers and comments appear.]I spent a little time trying t…