Question 1

I am trying to decode u'\uf04a' in python thus I can print it without error warnings. In other words, I need to convert stupid microsoft Windows 1252 characters to actual unicode

The source of html containing the unusual errors comes from here http://members.lovingfromadistance.com/showthread.php?12338-HAVING-SECOND-THOUGHTS

Read about u'\uf04a' and u'\uf04c' by clicking here http://www.fileformat.info/info/unicode/char/f04a/index.htm

one example looks like this:

"Oh god please some advice ":

Out[408]: u'Oh god please some advice \uf04c'

Given a thread like this as one example for test:

thread = u'who are you \uf04a Why you are so harsh to her \uf04c'
thread.decode('utf8')print u'\uf04a'
print u'\uf04a'.decode('utf8') # error!!!

'charmap' codec can't encode character u'\uf04a' in position 1526: character maps to undefined

With the help of two Python scripts, I successfully convert the u'\x92', but I am still stuck with u'\uf04a'. Any suggestions?

References

https://github.com/AnthonyBRoberts/NNS/blob/master/tools/killgremlins.py

Handling non-standard American English Characters and Symbols in a CSV, using Python

Solution:

According to the comments below: I replace these character set with the question mark('?')

thread = u'who are you \uf04a Why you are so harsh to her \uf04c'
thread = thread.replace(u'\uf04a', '?')
thread = thread.replace(u'\uf04c', '?')

Hope this helpful to the other beginners.

Question 2

The notation u'\uf04a' denotes the Unicode codepoint U+F04A, which is by definition a private use codepoint. This means that the Unicode standard does not assign any character to it, and never will; instead, it can be used by private agreements.

It is thus meaningless to talk about printing it. If there is a private agreement on using it in some context, then you print it using a font that has a glyph allocated to that codepoint. Different agreements and different fonts may allocate completely different characters and glyphs to the same codepoint.

It is possible that U+F04A is a result of erroneous processing (e.g., wrong conversions) of character data at some earlier phase.

how to convert u\uf04a to unicode in python [duplicate]

Related Q&A

How can I display a nxn matrix depending on users input?

How to launch 100 workers in multiprocessing?

Indexes of a list Python

str object is not callable - CAUTION: DO NO USE SPECIAL FUNCTIONS AS VARIABLES

Using `wb.save` results in UnboundLocalError: local variable rel referenced before assignment

Passing a Decimal(str(value)) to a dictionary for raw value

Delete regex matching part of file

How do I download files from the web using the requests module?

how to get queryset from django orm create

Count and calculation in a 2D array in Python