\ufeff Invalid character in identifier

2024/10/15 6:19:30

I have the following code :

import urllib.requesttry:url = "https://www.google.com/search?q=test"headers = {}usag = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0'headers['User-Agent'] = usag.encode('utf-8-sig')req = urllib.request.Request(url, headers=headers)resp = urllib.request.urlopen(req)respData = resp.read()saveFile = open('withHeaders.txt','w')saveFile.write(str(respData))saveFile.close()except Exception as e:print(str(e))

it gives me the following error:

D:\virtualenv\samples\urllibb>python 1.pyFile "1.py", line 35usag = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0'\ufeff^SyntaxError: invalid character in identifier

I can't see the \ufeff in my code though.

Answer

\ufeff is a the ZERO WIDTH NO-BREAK SPACE codepoint; it is not rendered when printing. It is used as a byte order mark in UTF-16 and UTF-32 to record the order in which the encoded bytes are to be decoded (big-endian or little-endian).

UTF-8 doesn't need a BOM (it only has one fixed ordering of the bytes, no need to track an alternative), but Microsoft decided it was a handy signature character for their tools to detect UTF-8 files vs. 8-bit encodings (such as most of the windows codepages employ).

I suspect you are using a Microsoft text editor such as Notepad to save your code. Don't do this, it'll include the BOM but Python doesn't support it or strip it from UTF-8 source files. You probably saved the file with Notepad, then continued with a different tool to add more code to the start and the BOM got caught in the middle.

Either delete the whole line and the next and re-type them, or select from the closing quote of the string you define until just before the h of headers on the next line, delete that part and re-insert a newline and enough indentation.

If your editor supports using escape sequences when searching and replacing (SublimeText does when in regex mode, for example), you could just use that to search for the character and replace it with an empty string. In SublimeText, switch on regex support and search for \x{feff}, replacing those occurrences with an empty string.

The Python utf-8-sig encoding that you are using here also includes that BOM:

headers['User-Agent'] = usag.encode('utf-8-sig')

HTTP headers should not include that codepoint either. HTTP headers typically stick to Latin-1 instead; even ASCII would suffice here, but otherwise use 'utf-8' (no -sig).

You don't really need to use str.encode() there, you could also just define a bytestring:

headers = {}
usag = b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0'
headers['User-Agent'] = usag

Note the b prefix to the string literal.

https://en.xdnf.cn/q/69319.html

Related Q&A

Python multiprocessing - Passing a list of dicts to a pool

This question may be a duplicate. However, I read lot of stuff around on this topic, and I didnt find one that matches my case - or at least, I didnt understood it.Sorry for the inconvenance.What Im tr…

Failed to write to file but generates no Error

Im trying to write to a file but its not working. Ive gone through step-by-step with the debugger (it goes to the write command but when I open the file its empty).My question is either: "How do I…

train spacy for text classification

After reading the docs and doing the tutorial I figured Id make a small demo. Turns out my model does not want to train. Heres the codeimport spacy import random import jsonTRAINING_DATA = [["My l…

Python threading vs. multiprocessing in Linux

Based on this question I assumed that creating new process should be almost as fast as creating new thread in Linux. However, little test showed very different result. Heres my code: from multiprocessi…

How to create a visualization for events along a timeline?

Im building a visualization with Python. There Id like to visualize fuel stops and the fuel costs of my car. Furthermore, car washes and their costs should be visualized as well as repairs. The fuel c…

Multiplying Numpy 3D arrays by 1D arrays

I am trying to multiply a 3D array by a 1D array, such that each 2D array along the 3rd (depth: d) dimension is calculated like:1D_array[d]*2D_arrayAnd I end up with an array that looks like, say:[[ [1…

Django Performing System Checks is running very slow

Out of nowhere Im running into an issue with my Django application where it runs the "Performing System Checks" command very slow. If I start the server with python manage.py runserverIt take…

str.translate vs str.replace - When to use which one?

When and why to use the former instead of the latter and vice versa?It is not entirely clear why some use the former and why some use the latter.

python BeautifulSoup searching a tag

My first post here, Im trying to find all tags in this specific html and i cant get them out, this is the code:from bs4 import BeautifulSoup from urllib import urlopenurl = "http://www.jutarnji.h…

How to remove extra whitespace from image in opencv? [duplicate]

This question already has answers here:How to remove whitespace from an image in OpenCV?(3 answers)Closed 3 years ago.I have the following image which is a receipt image and a lot of white space aroun…