Question 1

I have the following code:

from xml.etree import ElementTreefile_path = 'some_file_path'document = ElementTree.parse(file_path, ElementTree.XMLParser(encoding='utf-8'))

If my XML looks like the following it gives me the error: "xml.etree.ElementTree.ParseError: not well-formed"

<?xml version="1.0" encoding="utf-8" ?>
<pages>
<page id="1">
<textbox id="0">
<textline bbox="53.999,778.980,130.925,789.888">
<text font="GCCBBY+TT228t00" bbox="60.598,778.980,64.594,789.888" size="10.908">H</text>
<text font="GCCBBY+TT228t00" bbox="64.558,778.980,70.558,789.888" size="10.908">-</text>
<text>
</text>
</textline>
</textbox>
</page>
</pages>

In sublime or Notepad++ I see highlighted characters such as ACK, DC4, or STX which seem to be the culprit (one of them appears as a "-" in the above xml in the second "text" node). If I remove these characters it works. What are these and how can I fix this?

Question 2

Running your code as follows, and it's working fine:

from xml.etree import ElementTree
from StringIO import StringIO xml_content = """<?xml version="1.0" encoding="utf-8" ?>
<pages>
<page id="1">
<textbox id="0">
<textline bbox="53.999,778.980,130.925,789.888">
<text font="GCCBBY+TT228t00" bbox="60.598,778.980,64.594,789.888" size="10.908">H</text>
<text font="GCCBBY+TT228t00" bbox="64.558,778.980,70.558,789.888" size="10.908">-</text>
<text>
</text>
</textline>
</textbox>
</page>
</pages>"""print("parsing xml document")
# using StringIO to simulate reading from file  
document = ElementTree.parse(StringIO(xml_content), ElementTree.XMLParser(encoding='utf-8')) for elem in document.iter():print(elem.tag)

And the output is as expected:

parsing xml document
pages
page
textbox
textline
text
text
text

So, the issue is how you are copying and pasting your file from notepad++, maybe it's adding some special characters so try with another editor.

xml.etree.ElementTree.ParseError: not well-formed

Related Q&A

Convert nested XML content into CSV using xml tree in python

How to decode binary file with for index, line in enumerate(file)?

how to install pyshpgeocode from git [duplicate]

How to export dictionary as CSV using Python?

Passing values to a function from within a function in python

How to make Stop button to terminate start function already running in Tkinter (Python)

adding language to markdown codeblock in bulk

Cant randomize list with classes inside of it Python 2.7.4

calculate the queue for orders based on creation and delivery date, by product group

Python print with string invalid syntax