How to read XML header in Python

2024/10/12 10:21:12

How can I read the header of an XML document in Python 3?

Ideally, I would use the defusedxml module as the documentation states that it's safer, but at this point (after hours of trying to figure this out), I'd settle for any parser.

For example, I have a document (this is actually from an exercise) that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"> <!-- this is root --><!-- CONTENTS -->
</plist>

I'm wondering how to access everything before the root node.

This seems like such a general question that I thought I would easily find an answer online, but I guess I was wrong. The closest thing I found was this question on Stack Overflow, which didn't really help (I looked into xml.sax, but couldn't find anything relevant).

Answer

I tried minidom which is vulnerable to billion laughs and quadratic blowup attacks according to link you provided. Here is my code:

from xml.dom.minidom import parsedom = parse('file.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))
print(dom.doctype.toxml())
#or
print(dom.getElementsByTagName('plist')[0].previousSibling.toxml())
#or
print(dom.childNodes[0].toxml())

Output:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>

You can use minidom from defusedxml. I downloaded that package and just replaced import with from defusedxml.minidom import parse and code worked with same output.

https://en.xdnf.cn/q/69665.html

Related Q&A

Shift interpolation does not give expected behaviour

When using scipy.ndimage.interpolation.shift to shift a numpy data array along one axis with periodic boundary treatment (mode = wrap), I get an unexpected behavior. The routine tries to force the firs…

HEX decoding in Python 3.2

In Python 2.x Im able to do this:>>> 4f6c6567.decode(hex_codec) OlegBut in Python 3.2 I encounter this error:>>> b4f6c6567.decode(hex_codec) Traceback (most recent call last):File &qu…

How do I access session data in Jinja2 templates (Bottle framework on app engine)?

Im running the micro framework Bottle on Google App Engine. Im using Jinja2 for my templates. And Im using Beaker to handle the sessions. Im still a pretty big Python newbie and am pretty stoked I g…

What is a dimensional range of [-1,0] in Pytorch?

So Im struggling to understand some terminology about collections in Pytorch. I keep running into the same kinds of errors about the range of my tensors being incorrect, and when I try to Google for a …

Pyinstaller executable keeps opening

Background I am working on face recognition following this link and I would like to build a standalone application using Python. My main.py script looks like the following. # main.py# Import packages a…

Occasional deadlock in multiprocessing.Pool

I have N independent tasks that are executed in a multiprocessing.Pool of size os.cpu_count() (8 in my case), with maxtasksperchild=1 (i.e. a fresh worker process is created for each new task). The mai…

How to kill a subprocess called using subprocess.call in python? [duplicate]

This question already has answers here:How to terminate a python subprocess launched with shell=True(11 answers)Closed 7 years ago.I am calling a command using subprocess.call(cmd, shell=True)The comma…

Printing one color using imshow [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

Send headers along in python [duplicate]

This question already has answers here:Changing user agent on urllib2.urlopen(9 answers)Closed 9 years ago.I have the following python script and I would like to send "fake" header informatio…

How to download image to memory using python requests?

Like here How to download image using requests , but to memory, using http://docs.python-requests.org.