xml.etree.ElementTree.ParseError: not well-formed

2024/11/13 10:25:25

I have the following code:

from xml.etree import ElementTreefile_path = 'some_file_path'document = ElementTree.parse(file_path, ElementTree.XMLParser(encoding='utf-8'))

If my XML looks like the following it gives me the error: "xml.etree.ElementTree.ParseError: not well-formed"

<?xml version="1.0" encoding="utf-8" ?>
<pages>
<page id="1">
<textbox id="0">
<textline bbox="53.999,778.980,130.925,789.888">
<text font="GCCBBY+TT228t00" bbox="60.598,778.980,64.594,789.888" size="10.908">H</text>
<text font="GCCBBY+TT228t00" bbox="64.558,778.980,70.558,789.888" size="10.908">-</text>
<text>
</text>
</textline>
</textbox>
</page>
</pages>

In sublime or Notepad++ I see highlighted characters such as ACK, DC4, or STX which seem to be the culprit (one of them appears as a "-" in the above xml in the second "text" node). If I remove these characters it works. What are these and how can I fix this?

Answer

Running your code as follows, and it's working fine:

from xml.etree import ElementTree
from StringIO import StringIO xml_content = """<?xml version="1.0" encoding="utf-8" ?>
<pages>
<page id="1">
<textbox id="0">
<textline bbox="53.999,778.980,130.925,789.888">
<text font="GCCBBY+TT228t00" bbox="60.598,778.980,64.594,789.888" size="10.908">H</text>
<text font="GCCBBY+TT228t00" bbox="64.558,778.980,70.558,789.888" size="10.908">-</text>
<text>
</text>
</textline>
</textbox>
</page>
</pages>"""print("parsing xml document")
# using StringIO to simulate reading from file  
document = ElementTree.parse(StringIO(xml_content), ElementTree.XMLParser(encoding='utf-8')) for elem in document.iter():print(elem.tag) 

And the output is as expected:

parsing xml document
pages
page
textbox
textline
text
text
text

So, the issue is how you are copying and pasting your file from notepad++, maybe it's adding some special characters so try with another editor.

https://en.xdnf.cn/q/119493.html

Related Q&A

Convert nested XML content into CSV using xml tree in python

Im very new to python and please treat me as same. When i tried to convert the XML content into List of Dictionaries Im getting output but not as expected and tried a lot playing around.XML Content<…

How to decode binary file with for index, line in enumerate(file)?

I am opening up an extremely large binary file I am opening in Python 3.5 in file1.py:with open(pathname, rb) as file:for i, line in enumerate(file):# parsing hereHowever, I naturally get an error beca…

how to install pyshpgeocode from git [duplicate]

This question already has answers here:The unauthenticated git protocol on port 9418 is no longer supported(10 answers)Closed 2 years ago.I would like to install the following from Git https://github.c…

How to export dictionary as CSV using Python?

I am having problems exporting certain items in a dictionary to CSV. I can export name but not images (the image URL).This is an example of part of my dictionary: new = [{ "name" : "pete…

Passing values to a function from within a function in python

I need to pass values from one function to the next from within the function.For example (my IRC bot programmed to respond to commands in the channel):def check_perms(nick,chan,cmd):sql = "SELECT …

How to make Stop button to terminate start function already running in Tkinter (Python)

I am making a GUI using Tkinter with two main buttons: "Start" and "Stop". Could you, please, advise on how to make the "Stop" button to terminate the already running func…

adding language to markdown codeblock in bulk

My Problem is to add to every single block of code a language in my markdown files. Ive hundreds of files in nested directories. The files have this form: ```language a ```Normal text``` b ```Normal te…

Cant randomize list with classes inside of it Python 2.7.4

I am new to coding and I need some help. Im trying to randomize these rooms or scenes in a text adventure but whenever I try to randomize it they dont even show up when I run it! Here is the script:fro…

calculate the queue for orders based on creation and delivery date, by product group

I have a Pandas dataframe containing records for a lot of orders, one recorde for each order. Each record has order_id, category_id, created_at and picked_at. I need to calculate queue length for each …

Python print with string invalid syntax

I have a rock, paper, scissors code Ive been working on lately (yes, I am a total noob at coding), and I get an Invalid Syntax error with this specific line:print(The magical 8ball reads "Your for…