How can I replace Unicode characters in Python?

2024/10/11 13:18:57

I'm pulling Twitter data via their API and one of the tweets has a special character (the right apostrophe) and I keep getting an error saying that Python can't map or character map the character. I've looked all over the Internet, but I have yet to find a solution for this issue. I just want to replace that character with either an apostrophe that Python will recognize, or an empty string (essentially removing it). I'm using Python 3.3. Any input on how to fix this problem? It may seem simple, but I'm a newbie at Python.

Edit: Here is the function I'm using to try to filter out the unicode characters that throw errors.

@staticmethod
def UnicodeFilter(var):temp = vartemp = temp.replace(chr(2019), "'")temp = Functions.ToSQL(temp)return temp

Also, when running the program, my error is as follows.

'charmap' codec can't encode character '\u2019' in position 59: character maps to 'undefined'

Edit: Here is a sample of my source code:

import json
import mysql.connector
import unicodedata
from MySQLCL import MySQLCLclass Functions(object):
"""This is a class for Python functions"""@staticmethod
def Clean(string):temp = str(string)temp = temp.replace("'", "").replace("(", "").replace(")", "").replace(",", "").strip()return temp@staticmethod
def ParseTweet(string):for x in range(0, len(string)):tweetid = string[x]["id_str"]tweetcreated = string[x]["created_at"]tweettext = string[x]["text"]tweetsource = string[x]["source"]truncated = string[x]["truncated"]inreplytostatusid = string[x]["in_reply_to_status_id"]inreplytouserid = string[x]["in_reply_to_user_id"]inreplytoscreenname = string[x]["in_reply_to_screen_name"]geo = string[x]["geo"]coordinates = string[x]["coordinates"]place = string[x]["place"]contributors = string[x]["contributors"]isquotestatus = string[x]["is_quote_status"]retweetcount = string[x]["retweet_count"]favoritecount = string[x]["favorite_count"]favorited = string[x]["favorited"]retweeted = string[x]["retweeted"]possiblysensitive = string[x]["possibly_sensitive"]language = string[x]["lang"]print(Functions.UnicodeFilter(tweettext))#print("INSERT INTO tweet(ExTweetID, TweetText, Truncated, InReplyToStatusID, InReplyToUserID, InReplyToScreenName, IsQuoteStatus, RetweetCount, FavoriteCount, Favorited, Retweeted, Language, TweetDate, TweetSource, PossiblySensitive) VALUES (" + str(tweetid) + ", '" + Functions.UnicodeFilter(tweettext) + "', " + str(truncated) + ", " + Functions.CheckNull(inreplytostatusid) + ", " + Functions.CheckNull(inreplytouserid) + ", '" + Functions.CheckNull(inreplytoscreenname) + "', " + str(isquotestatus) + ", " + str(retweetcount) + ", " + str(favoritecount) + ", " + str(favorited) + ", " + str(retweeted) + ", '" + str(language) + "', '" + Functions.ToSQL(tweetcreated) + "', '" + Functions.ToSQL(tweetsource) + "', " + str(possiblysensitive) + ")")#MySQLCL.Set("INSERT INTO tweet(ExTweetID, TweetText, Truncated, InReplyToStatusID, InReplyToUserID, InReplyToScreenName, IsQuoteStatus, RetweetCount, FavoriteCount, Favorited, Retweeted, Language, TweetDate, TweetSource, PossiblySensitive) VALUES (" + str(tweetid) + ", '" + tweettext + "', " + str(truncated) + ", " + Functions.CheckNull(inreplytostatusid) + ", " + Functions.CheckNull(inreplytouserid) + ", '" + Functions.CheckNull(inreplytoscreenname) + "', " + str(isquotestatus) + ", " + str(retweetcount) + ", " + str(favoritecount) + ", " + str(favorited) + ", " + str(retweeted) + ", '" + language + "', '" + tweetcreated + "', '" + str(tweetsource) + "', " + str(possiblysensitive) + ")")@staticmethod
def ToBool(variable):if variable.lower() == 'true':return Trueelif variable.lower() == 'false':return False@staticmethod
def CheckNull(var):if var == None:return ""else:return var@staticmethod
def ToSQL(var):temp = vartemp = temp.replace("'", "''")return str(temp)@staticmethod
def UnicodeFilter(var):temp = var#temp = temp.replace(chr(2019), "'")unicodestr = unicode(temp, 'utf-8')if unicodestr != temp:temp = "'"temp = Functions.ToSQL(temp)return temp

ekhumoro's response was correct.

Answer

There seem to be two problems with your program.

Firstly, you are passing the wrong code point to chr(). The hexdecimal code point of the character is 0x2019, but you are passing in the decimal number 2019 (which equates to 0x7e3 in hexadecimal). So you need to do either:

    temp = temp.replace(chr(0x2019), "'") # hexadecimal

or:

    temp = temp.replace(chr(8217), "'") # decimal

in order to replace the character correctly.

Secondly, the reason you are getting the error is because some other part of your program (probably the database backend) is trying to encode unicode strings using some encoding other than UTF-8. It's hard to be more precise about this, because you did not include the full traceback in your question. However, the reference to "charmap" suggests a Windows code page is being used (but not cp1252); or an iso encoding (but not iso8859-1, aka latin1); or possibly KOI8_R.

Anyway, the correct way to deal with this issue is to ensure all parts of your program (and especially the database) use UTF-8. If you do that, you won't have to mess about replacing characters anymore.

https://en.xdnf.cn/q/118322.html

Related Q&A

Filtering Pandas DataFrame using a condition on column values that are numpy arrays

I have a Pandas DataFrame called dt, which has two columns called A and B. The values of column B are numpy arrays; Something like this: index A B 0 a [1,2,3] 1 b [2,3,4] 2 c …

Creation a tridiagonal block matrix in python [duplicate]

This question already has answers here:Block tridiagonal matrix python(9 answers)Closed 6 years ago.How can I create this matrix using python ? Ive already created S , T , X ,W ,Y and Z as well as the…

Python tkinter checkbutton value always equal to 0

I put the checkbutton on the text widget, but everytime I select a checkbutton, the function checkbutton_value is called, and it returns 0.Part of the code is :def callback():file_name=askopenfilename(…

How does derived class arguments work in Python?

I am having difficulty understanding one thing in Python.I have been coding in Python from a very long time but theres is something that just struck me today which i struggle to understandSo the situat…

grouping on tems in a list in python

I have 60 records with a column "skillsList" "("skillsList" is a list of skills) and "IdNo". I want to find out how many "IdNos" have a skill in common.How …

How do I show a suffix to user input in Python?

I want a percentage sign to display after the users enters their number. Thankspercent_tip = float(input(" Please Enter the percent of the tip:")("%"))For example, before the user t…

Discord.py Self Bot using rewrite

Im trying to make a selfbot using discord.py rewrite. Im encountering issues when attempting to create a simple command. Id like my selfbot to respond with "oof" when ">>>test&q…

int to binary python

This question is probably very easy for most of you, but i cant find the answer so far.Im building a network packet generator that goes like this:class PacketHeader(Packet): fields = OrderedDict([(&quo…

Get aiohttp results as string

Im trying to get data from a website using async in python. As an example I used this code (under A Better Coroutine Example): https://www.blog.pythonlibrary.org/2016/07/26/python-3-an-intro-to-asyncio…

Waiting for a timer to terminate before continuing running the code

The following code updates the text of a button every second after the START button was pressed. The intended functionality is for the code to wait until the timer has stopped before continuing on with…