Is Python bad at XML? [closed]

2024/10/9 8:31:37

EDIT

The use of the phrase "bad at XML" in this question has been a point of contention, so I'd like to start out by providing a very clear definition of what I mean by this term in this context: if support for standard XML APIs is poor, and forces one to use a language-specific API, in which namespaces seem to be an afterthought, then I would be inclined to characterize that language as being not as well suited to using XML as other mainstream languages that do not have these issues. "Bad at XML" is just a shorthand for these conditions, and I think it is a fair way to characterize it. As I will describe, my initial experience with Python has raised concerns about whether it fulfils these conditions; but, because in general my experience with Python has been quite positive, it seems likely that I'm missing something, thus motivating this question.

I'm trying to do some very simple XML processing with Python. I had initially hoped to be able to reuse my knowledge of standard W3C DOM API's, and happily found that the xml.dom and xml.dom.minidom modules did a good job of supporting these API's. Unfortunately, however, serialization proved to be problematic, for the following reasons:

  • xml.dom does not come with a serializer
  • the PyXML library, which includes a serializer for xml.dom, is no longer maintained, AND
  • minidom does not support serialization of namespaces, even though namespaces are supported in the API

I looked through the list of other W3C-like libraries here:

http://wiki.python.org/moin/PythonXml#W3CDOM-likelibraries

I found that many other libraries, such as 4Suite and libxml2dom, are also not maintained.

On the other hand, itools at first glance appears to be maintained, but there does not appear to be an Ubuntu/Debian package available, and so would be difficult to deploy and maintain.

At this point, it seemed like trying to use W3C DOM API's in my Python application was going to be dead-end, and I began to look at the ElementTree API. But the way the eTree API supports namespaces I think is horribly ugly, requiring one to use string concatenation every time an element in a particular namespace is created:

http://lxml.de/tutorial.html#namespaces

So, my question is, have I overlooked something, or is support for XML (in particular W3C DOM) actually quite bad in Python?

EDIT

Here follows a list of more precise questions, the answers to which would really help me:

  • Is there reasonable support for W3C DOM in Python?
  • If not xml.dom, do you use e.g. etree instead of W3C DOM?
  • If so, which library is best, and how do you overcome the issues regarding namespacing in the API?
  • If you use W3C DOM instead, are you aware of a library that implements serialization with support for namespaces?
Answer

I would say python handles XML pretty well. The number of different libraries available speaks to that - you have lots of options. And if there are features missing from libraries that you would like to use, feel free to contribute some patches!

I personally use the DOM and lxml.etree (etree is really fast). However, I feel your pain about the namespace thing. I wrote a quick helper function to deal with it:

DEFAULT_NS = "http://www.domain.org/path/to/xml"def add_xml_namespace(path, namespace=DEFAULT_NS):"""Adds namespaces to an XPath-ish expression path for etreeTest simple expression:>>> add_xml_namespace('image/namingData/fileBaseName')'{http://www.domain.org/path/to/xml}image/{http://www.domain.org/path/to/xml}namingData/{http://www.domain.org/path/to/xml}fileBaseName'More complicated expression>>> add_xml_namespace('.//image/*') './/{http://www.domain.org/path/to/xml}image/*'>>> add_xml_namespace('.//image/text()')'.//{http://www.domain.org/path/to/xml}image/text()'"""pattern = re.compile(r'^[A-Za-z0-9-]+$')tags = path.split('/')for i in xrange(len(tags)):if pattern.match(tags[i]):tags[i] = "{%s}%s" % (namespace, tags[i])return '/'.join(tags)

I use it like so:

from lxml import etree
from utilities import add_xml_namespace as nstree = etree.parse('file.xml')
node = tree.get_root().find(ns('root/group/subgroup'))
# etc. 

If you don't know the namespace ahead of time, you can extract it from a root node:

tree = etree.parse('file.xml')
root = tree.getroot().tag
namespace = root[1:root.index('}')]
ns = lambda path: add_xml_namespace(path, namespace)
...

Additional comment: There is a little work involved here, but work is necessary when dealing with XML. That's not a python issue, it's an XML issue.

https://en.xdnf.cn/q/70038.html

Related Q&A

Conditional color with matplotlib scatter

I have the following Pandas Dataframe, where column a represents a dummy variable:What I would like to do is to give my markers a cmap=jet color following the value of column b, except when the value i…

Cumulative sum with list comprehension

I have a list of integers:x = [3, 5, 2, 7]And I want to create a new list where the nth element is the sum of elements in x from 0 to n-1.This would result in:y = [0, 3, 8, 10]How can I do this with li…

How to get image size in KB while using Pillow-python before storing to disk?

Im using Pillow for image processing in Python,url="http://www.image.com/some_image.jpg";path = io.BytesIO(urllib.request.urlopen(url).read())original_image = Image.open(path)Any idea how i c…

Python: Print only one time inside a loop

I have a code where I want capture a video from a camera. I want to use Logging library of Python to get messages on the shell or export them to a text file.Here is a part of my code where inside the w…

Calling Different Functions in Python Based on Values in a List

I have a script that takes a list of metrics as an input, and then fetches those metrics from the database to perform various operations with them. My problem is that different clients get different s…

takes 1 positional argument but 2 were given

I would like to run a command line tool to run in a separate function and passed to the button click the additional command for this program but each time I get this as a response.takes 1 positional ar…

With py.test, database is not reset after LiveServerTestCase

I have a number of Django tests and typically run them using py.test. I recently added a new test case in a new file test_selenium.py. This Test Case has uses the LiveServerTestCase and StaticLiveSer…

Using flask wtforms validators without using a form

Im receiving user registration data from an iOS application and Id like to use the validators that come with wtforms to make sure the email and password are acceptable. However, Im not using a flask f…

How to install graph-tool for Anaconda Python 3.5 on linux-64?

Im trying to install graph-tool for Anaconda Python 3.5 on Ubuntu 14.04 (x64), but it turns out thats a real trick.I tried this approach, but run into the problem:The following specifications were foun…

How to quickly encrypt a password string in Django without an User Model?

Based on my current Django app settings, is there a function or a snippet that allows me to view the encrypted password, given a raw string? I am testing some functionality and this would be useful fo…