Question 1

How can I read the header of an XML document in Python 3?

Ideally, I would use the defusedxml module as the documentation states that it's safer, but at this point (after hours of trying to figure this out), I'd settle for any parser.

For example, I have a document (this is actually from an exercise) that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"> <!-- this is root --><!-- CONTENTS -->
</plist>

I'm wondering how to access everything before the root node.

This seems like such a general question that I thought I would easily find an answer online, but I guess I was wrong. The closest thing I found was this question on Stack Overflow, which didn't really help (I looked into xml.sax, but couldn't find anything relevant).

Question 2

I tried minidom which is vulnerable to billion laughs and quadratic blowup attacks according to link you provided. Here is my code:

from xml.dom.minidom import parsedom = parse('file.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))
print(dom.doctype.toxml())
#or
print(dom.getElementsByTagName('plist')[0].previousSibling.toxml())
#or
print(dom.childNodes[0].toxml())

Output:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>

You can use minidom from defusedxml. I downloaded that package and just replaced import with from defusedxml.minidom import parse and code worked with same output.

How to read XML header in Python

Related Q&A

Shift interpolation does not give expected behaviour

HEX decoding in Python 3.2

How do I access session data in Jinja2 templates (Bottle framework on app engine)?

What is a dimensional range of [-1,0] in Pytorch?

Pyinstaller executable keeps opening

Occasional deadlock in multiprocessing.Pool

How to kill a subprocess called using subprocess.call in python? [duplicate]

Printing one color using imshow [closed]

Send headers along in python [duplicate]

How to download image to memory using python requests?