I would like to convert HTML entities back to its human readable format, e.g. '£'
to '£', '°'
to '°' etc.
I've read several posts regarding this question
Converting html source content into readable format with Python 2.x
Decode HTML entities in Python string?
Convert XML/HTML Entities into Unicode String in Python
and according to them, I chose to use the undocumented function unescape(), but it doesn't work for me...
My code sample is like:
import HTMLParserhtmlParser = HTMLParser.HTMLParser()
decoded = htmlParser.unescape('© 2013')
print decoded
When I ran this python script, the output is still:
© 2013
instead of
© 2013
I'm using Python 2.X, working on Windows 7 and Cygwin console. I googled and didn't find any similar problems..Could anyone help me with this?