Python - replace unicode emojis with ASCII characters

2024/9/20 0:22:58

I have an issue with one of my current weekend projects. I am writing a Python script that fetches some data from different sources and then spits everything out to an esc-pos printer. As you might imagine pos printers don't exactly like emojis...

So text like this:

可爱!!!!!!!!😍😍😍😍😍😍😍😝

gives me this character string:

'\u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'

The result that comes out of the printer is quite different than what I would like of course. So I need to replace these non-ASCII characters with something else. I don't really care for the first characters, but I do care about emojis. Using something like: unidecode(str(text)) will at least strip them out, but I want to convert them to something more useful. Either into classic smilies like [:-D] or into [SMILING FACE WITH HEART-SHAPED EYES].

My problem is... how would one go about doing this? Manually creating a lookup table for most common emojis seems a bit tedious, so I am wondering if there is something else that I can do.

Answer

With the tip about unicodedata.name and some further research I managed to put this thing together:

import unicodedata
from unidecode import unidecodedef deEmojify(inputString):returnString = ""for character in inputString:try:character.encode("ascii")returnString += characterexcept UnicodeEncodeError:replaced = unidecode(str(character))if replaced != '':returnString += replacedelse:try:returnString += "[" + unicodedata.name(character) + "]"except ValueError:returnString += "[x]"return returnString

Basically it first tries to find the most appropriate ascii representation, if that fails it tries using the unicode name, and if even that fails it simply replaces it with some simple marker.

For example Taking this string:

abcdšeđfčgžhÅiØjÆk 可爱!!!!!!!!😍😍😍😍😍😍😍😝

And running the function:

string = u'abcdšeđfčgžhÅiØjÆk \u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'
print(deEmojify(string))

Will produce the following result:

abcdsedfcgzhAiOjAEk[x] Ke Ai !!!!!!!![SMILING FACE WITH HEART-SHAPEDEYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITHHEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACEWITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILINGFACE WITH HEART-SHAPED EYES][FACE WITH STUCK-OUT TONGUE ANDTIGHTLY-CLOSED EYES]

https://en.xdnf.cn/q/72541.html

Related Q&A

How do I get my python object back from a QVariant in PyQt4?

I am creating a subclass of QAbstractItemModel to be displayed in an QTreeView.My index() and parent() function creates the QModelIndex using the QAbstractItemModel inherited function createIndex and p…

Django serializers vs rest_framework serializers

What is the difference between Django serializers vs rest_framework serializers? I making a webapp, where I want the API to be part of the primary app created by the project. Not creating a separate A…

Pandas replace non-zero values

I know I can replace all nan values with df.fillna(0) and replace a single value with df.replace(-,1), but how can I replace all non-zero values with a single value?

Pandas percentage change using group by

Suppose I have the following DataFrame: df = pd.DataFrame({city: [a, a, a, b, b, c, d, d, d], year: [2013, 2014, 2016, 2015, 2016, 2013, 2016, 2017, 2018],value: [10, 12, 16, 20, 21, 11, 15, 13, 16]})A…

Django cannot find my static files

I am relatively new to web dev. and I am trying to build my first web application. I have my static folder in project_root/static but for some reason, I keep getting 404s when I run the server:Not Foun…

How can I find intersection of two large file efficiently using python?

I have two large files. Their contents looks like this:134430513125296589151963957125296589The file contains an unsorted list of ids. Some ids may appear more than one time in a single file. Now I want…

Failed to load the native TensorFlow runtime - TensorFlow 2.1

I have a desktop computer and a notebook, when I tried to install tensorflow on a notebook just by using pip install tensorflow it worked ok, then I tried the same on my desktop computer and when I tri…

(Python) Issues with directories that have special characters

OS: Windows server 03 Python ver: 2.7For the code below, its runs fine when I substitute "[email protected]" with "fuchida". If I use the email format for directory name I get the f…

LibCST: Converting arbitrary nodes to code

Is it possible to dump an arbitrary LibCST node into Python code? My use case is that I want to extract the code for functions that match a specific naming scheme. I can extract the FunctionDef nodes …

calculating the number of k-combinations with and without SciPy

Im puzzled by the fact that the function comb of SciPy appears to be slower than a naive Python implementation. This is the measured time for two equivalent programs solving the Problem 53 of Project E…