HTMLParser.HTMLParser().unescape() doesnt work

2024/11/6 3:16:07

I would like to convert HTML entities back to its human readable format, e.g. '£' to '£', '°' to '°' etc.

I've read several posts regarding this question

Converting html source content into readable format with Python 2.x

Decode HTML entities in Python string?

Convert XML/HTML Entities into Unicode String in Python

and according to them, I chose to use the undocumented function unescape(), but it doesn't work for me...

My code sample is like:

import HTMLParserhtmlParser = HTMLParser.HTMLParser()
decoded = htmlParser.unescape('© 2013')
print decoded

When I ran this python script, the output is still:

© 2013

instead of

© 2013

I'm using Python 2.X, working on Windows 7 and Cygwin console. I googled and didn't find any similar problems..Could anyone help me with this?

Answer

Apparently HTMLParser.unescape was a bit more primitive before Python 2.6.

Python 2.5:

>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape('©')
'©'

Python 2.6/2.7:

>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape('©')
u'\xa9'

UPDATE: Python 3.4+:

>>> import html
>>> html.unescape('©')
'©'

See the 2.5 implementation vs the 2.6 implementation / 2.7 implementation

https://en.xdnf.cn/q/70979.html

Related Q&A

What security issues need to be addressed when working with Google App Engine?

Ive been considering using Google App Engine for a few hobby projects. While they wont be handling any sensitive data, Id still like to make them relatively secure for a number of reasons, like learnin…

Supporting multiple Python module versions (with the same version of Python)

I looked around but cannot find a clear answer to my question.I have a very legitimate need for supporting N-versions of the same Python module.If they are stored in the same same package/directory, th…

ImportError: cannot import name signals

Im using Django 1.3.0 with Python 2.7.1. In every test I write the following imports I get the importError above:from django.utils import unittest from django.test.client import ClientThe full stack tr…

Return a Pandas DataFrame as a data_table from a callback with Plotly Dash for Python

I would like to read a .csv file and return a groupby function as a callback to be displayed as a simple data table with "dash_table" library. @Lawliets helpful answer shows how to do that wi…

Nose: How to skip tests by default?

I am using Pythons nose and I have marked some of my tests as "slow", as explained in the attrib plugin documentation.I would like to skip all "slow" Tests by default when running n…

SQLAlchemy ORM select multiple entities from subquery

I need to query multiple entities, something like session.query(Entity1, Entity2), only from a subquery rather than directly from the tables. The docs have something about selecting one entity from a s…

How to ensure data is received between commands

Im using Paramiko to issue a number of commands and collect results for further analysis. Every once in a while the results from the first command are note fully returned in time and end up in the out…

Format Excel Column header for better visibility and Color

I have gone through many posts but did not found the exact way to do the below. Sorry for attaching screenshot(Just for better visibility) as well , I will write it also. Basically it looks like -Name…

Using multiple keywords in xattr via _kMDItemUserTags or kMDItemOMUserTags

While reorganizing my images, in anticipation of OSX Mavericks I am writing a script to insert tags into the xattr fields of my image files, so I can search them with Spotlight. (I am also editing the …

JAX Apply function only on slice of array under jit

I am using JAX, and I want to perform an operation like @jax.jit def fun(x, index):x[:index] = other_fun(x[:index])return xThis cannot be performed under jit. Is there a way of doing this with jax.ops …