Question 1

I have a following xml document:

<node0><node1><node2 a1="x1"> ... </node2><node2 a1="x2"> ... </node2><node2 a1="x1"> ... </node2></node1>
</node0>

I want to filter out node2 when a1="x2". The user provides the xpath and attribute values that need to tested and filtered out. I looked at some solutions in python like BeautifulSoup but they are too complicated and dont preserve the case of text. I want to keep the document same as before with some stuff filtered out.

Can you recommend a simple and succinct solution? This should not be too complicated from the looks of it. The actual xml document is not as simple as above but idea is the same.

Question 2

This uses xml.etree.ElementTree which is in the standard library:

import xml.etree.ElementTree as xee
data='''\
<node1><node2 a1="x1"> ... </node2><node2 a1="x2"> ... </node2><node2 a1="x1"> ... </node2>
</node1>
'''
doc=xee.fromstring(data)for tag in doc.findall('node2'):if tag.attrib['a1']=='x2':doc.remove(tag)
print(xee.tostring(doc))
# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

This uses lxml, which is not in the standard library, but has a more powerful syntax:

import lxml.etree
data='''\
<node1><node2 a1="x1"> ... </node2><node2 a1="x2"> ... </node2><node2 a1="x1"> ... </node2>
</node1>
'''
doc = lxml.etree.XML(data)
e=doc.find('node2/[@a1="x2"]')
doc.remove(e)
print(lxml.etree.tostring(doc))# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

Edit: If node2 is buried more deeply in the xml, then you can iterate through all the tags, check each parent tag to see if the node2 element is one of its children, and the remove it if so:

Using only xml.etree.ElementTree:

doc=xee.fromstring(data)
for parent in doc.getiterator():for child in parent.findall('node2'):if child.attrib['a1']=='x2':parent.remove(child)

Using lxml:

doc = lxml.etree.XML(data)
for parent in doc.iter('*'):child=parent.find('node2/[@a1="x2"]')if child is not None:parent.remove(child)

xml filtering with python

Related Q&A

What it really is @client.event? discord.py

How to customize virtualenv shell prompt

How to get the percent change of values in a dataframe while caring about NaN values?

Convert CSV to YAML, with Unicode?

Why is the divide and conquer method of computing factorials so fast for large ints? [closed]

Python calculate speed, distance, direction from 2 GPS coordinates

Installed gunicorn but it is not in venv/bin folder

Does Pythons asyncio lock.acquire maintain order?

Howto ignore specific undefined variables in Pydev Eclipse

Faster way to calculate hexagon grid coordinates