SyntaxError: Non-ASCII character. Python

2024/10/8 8:27:50

Could somebody tell me which character is a non-ASCII character in the following:

Columns(str) – comma-seperated list of values. Works only if format is tab or xls. For UnitprotKB, some possible columns are: id, entry name, length, organism. Some column names must be followed by a database name (i.e. ‘database(PDB)’). Again see uniprot website for more details. See also _valid_columns for the full list of column keyword.

Essentially I am defining a class and trying to give it a comment to define how it works:

def test(self,uniprot_id):'''Same as the UniProt.search() method arguments:search(query, frmt='tab', columns=None, include=False, sort='score', compress=False, limit=None, offset=None, maxTrials=10)query (str) -- query must be a valid uniprot query. See http://www.uniprot.org/help/text-search, http://www.uniprot.org/help/query-fields See also example belowfrmt (str) -- a valid format amongst html, tab, xls, asta, gff, txt, xml, rdf, list, rss. If tab or xls, you can also provide the columns argument. (default is tab)include (bool) -- include isoform sequences when the frmt parameter is fasta. Include description when frmt is rdf.sort (str) -- by score by default. Set to None to bypass this behaviourcompress (bool) -- gzip the resultslimit (int) -- Maximum number of results to retrieve.offset (int) -- Offset of the first result, typically used together with the limit parameter.maxTrials (int) -- this request is unstable, so we may want to try several time.Columns(str) -- comma-seperated list of values. Works only if format is tab or xls. For UnitprotKB, some possible columns are: id, entry name, length, organism. Some column names must be followed by a database name (i.e. ‘database(PDB)’). Again see uniprot website for more details. See also _valid_columns for the full list of column keyword. ''''        u = UniProt()uniprot_entry = u.search(uniprot_id)return uniprot_entry

Without the line 52, i.e. the one beginning with 'columns' in the quoted out comment block, this works as expected but as soon as I describe what 'columns' is I get the following error:

SyntaxError: Non-ASCII character '\xe2' in file /home/cw00137/Documents/Python/Identify_gene.py on line 52, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Does anybody know what is going on?

Answer

You are using 'fancy' curly quotes in that line:

>>> u'‘database(PDB)’'
u'\u2018database(PDB)\u2019'

That's a U+2018 LEFT SINGLE QUOTATION MARK at the start and U+2019 RIGHT SINGLE QUOTATION MARK at the end.

Use ASCII quotes (U+0027 APOSTROPHE or U+0022 QUOTATION MARK) or declare an encoding other than ASCII for your source.

You are also using an U+2013 EN DASH:

>>> u'Columns(str) –'
u'Columns(str) \u2013'

Replace that with a U+002D HYPHEN-MINUS.

All three characters encode to UTF-8 with a leading E2 byte:

>>> u'\u2013 \u2018 \u2019'.encode('utf8')
'\xe2\x80\x93 \xe2\x80\x98 \xe2\x80\x99'

which you then see reflected in the SyntaxError exception message.

You may want to avoid using these characters in the first place. It could be that your OS is replacing these as you type, or you are using a word processor instead of a plain text editor to write your code and it is replacing these for you. You probably want to switch that feature off.

https://en.xdnf.cn/q/118714.html

Related Q&A

A pseudocode algorithm for integer addition based on binary operation

I have tried for ages to come up with a solution but just cant get my head around it.It needs to be based on two integers on the use of standard logical operations which have direct hardware implementa…

How to efficiently split overlapping ranges?

I am looking for an efficient method to split overlapping ranges, I have read many similar questions but none of them solves my problem. The problem is simple, given a list of triplets, the first two e…

pass 2D array to linear regression (sklearn)

I want to pass 2D array to linear regression: x = [[1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 3, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1],[0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0,…

How do I fix this OverflowError?

I keep getting a "OverflowError: math range error". No matter what I input, the result is the same. Im running Python 3.3, and its finding the problem at the last line. How do I fix this? (A…

Pyinstaller subprocess.check_output error

Ive bundled my app with pyinstaller to 2 *.exegui_app.exe (onefile) config.ini \libs (onedir)winservice.exe+ all DLLs and libsWhen I manually install service with command winservice.exe install everyth…

Exception handler to check if inline script for variable worked

I need to add exception handling that considers if line 7 fails because there is no intersection between the query and array brands. Im new to using exception handlers and would appreciate any advice o…

Parameter list with single argument

When testing Python parameter list with a single argument, I found some weird behavior with print.>>> def hi(*x): ... print(x) ... >>> hi() () >>> hi(1,2) (1, 2) >>…

Scatter plot of values in pandas dataframe

I have a pandas dataframe in the following format. I am trying to plot this data based on ClusterAssigned, with probably different colors for 0 and 1. Distance ClusterAssigned23 135 120 …

String Delimiter in Python

I want to do split a string using "},{" as the delimiter. I have tried various things but none of them work.string="2,1,6,4,5,1},{8,1,4,9,6,6,7,0},{6,1,2,3,9},{2,3,5,4,3 "Split it i…

Wrong encoding of email attachment

I have a python 2.7 script running on windows. It logs in gmail, checks for new e-mails and attachments:#!/usr/bin/env python # -*- coding: utf-8 -*-file_types = ["pdf", "doc", &quo…