XPath predicate with sub-paths with lxml?

2024/10/15 11:26:02

I'm trying to understand and XPath that was sent to me for use with ACORD XML forms (common format in insurance). The XPath they sent me is (truncated for brevity):

./PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo

Where I'm running into trouble is that Python's lxml library is telling me that [InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"] is an invalid predicate. I'm not able to find anywhere in the XPath spec on predicates which identifies this syntax so that I can modify this predicate to work.

Is there any documentation on what exactly this predicate is selecting? Also, is this even a valid predicate, or has something been mangled somewhere?

Possibly related:

I believe the company I am working with is an MS shop, so this XPath may be valid in C# or some other language in that stack? I'm not entirely sure.

Updates:

Per comment demand, here is some additional info.

XML sample:

<ACORD><InsuranceSvcRq><HomePolicyQuoteInqRq><PersPolicy><PersApplicationInfo><InsuredOrPrincipal><InsuredOrPrincipalInfo><InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd></InsuredOrPrincipalInfo><GeneralPartyInfo><Addr><Addr1></Addr1></Addr></GeneralPartyInfo></InsuredOrPrincipal></PersApplicationInfo></PersPolicy></HomePolicyQuoteInqRq></InsuranceSvcRq>
</ACORD>

Code sample (with full XPath instead of snippet):

>>> from lxml import etree
>>> tree = etree.fromstring(raw)
>>> tree.find('./InsuranceSvcRq/HomePolicyQuoteInqRq/PersPolicy/PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo/Addr/Addr1')
Traceback (most recent call last):File "<console>", line 1, in <module>File "lxml.etree.pyx", line 1409, in lxml.etree._Element.find (src/lxml/lxml.etree.c:39972)File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 271, in findit = iterfind(elem, path, namespaces)File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 261, in iterfindselector = _build_path_iterator(path, namespaces)File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 245, in _build_path_iteratorselector.append(ops[token[0]](_next, token))File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 207, in prepare_predicateraise SyntaxError("invalid predicate")
SyntaxError: invalid predicate
Answer

Change tree.find to tree.xpath. find and findall are present in lxml to provide compatibility with other implementations of ElementTree. These methods do not implement the entire XPath language. To employ XPath expressions containing more advanced features, use the xpath method, the XPath class, or XPathEvaluator.

For example:

import io
import lxml.etree as ETcontent='''\
<ACORD><InsuranceSvcRq><HomePolicyQuoteInqRq><PersPolicy><PersApplicationInfo><InsuredOrPrincipal><InsuredOrPrincipalInfo><InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd></InsuredOrPrincipalInfo><GeneralPartyInfo><Addr><Addr1></Addr1></Addr></GeneralPartyInfo></InsuredOrPrincipal></PersApplicationInfo></PersPolicy></HomePolicyQuoteInqRq></InsuranceSvcRq>
</ACORD>
'''
tree=ET.parse(io.BytesIO(content))
path='//PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo'
result=tree.xpath(path)
print(result)

yields

[<Element GeneralPartyInfo at b75a8194>]

while tree.find yields

SyntaxError: invalid node predicate
https://en.xdnf.cn/q/69293.html

Related Q&A

Best way to access and close a postgres database using python dataset

import dataset from sqlalchemy.pool import NullPooldb = dataset.connect(path_database, engine_kwargs={poolclass: NullPool})table_f1 = db[name_table] # Do operations on table_f1db.commit() db.execut…

Using different binds in the same class in Flask-SQLAlchemy

I currently have multiple databases with identical Tables and Columns (but different data inside). So clearly I need to use binds to access all of them, but its apparently not as simple as doing this:c…

Correctly parse date string with timezone information

Im receiving a formatted date string like this via the pivotal tracker API: "2012/06/05 17:42:29 CEST"I want to convert this string to a UTC datetime object, it looks like python-dateutil doe…

Can I add a sequence of markers on a Folium map?

Suppose I had a list, or pandas series, or latitude longitude pairs. With Folium, I can plot markers for a single pair of coordinates using coords = [46.8354, -121.7325] map_4 = folium.Map(location=[4…

Tkinter in Python 3.4 on Windows dont post internal clipboard data to the Windows clipboard on exit

I use the following code to place result of my small scripts in clipboard.from tkinter import Tk r = Tk() r.withdraw() r.clipboard_clear() r.clipboard_append("Result")It works fine on Python …

How do I group date by month using pd.Grouper?

Ive searched stackoverflow to find out how to group DateTime by month and for some reason I keep receiving this error, even after I pass the dataframe through pd.to.datetimeTypeError: Only valid with D…

Python Too many indices for array

I am reading a file in python using pandas and then saving it in a numpy array. The file has the dimension of 11303402 rows x 10 columns. I need to split the data for cross validation and for that I …

Removing named entities from a document using spacy

I have tried to remove words from a document that are considered to be named entities by spacy, so basically removing "Sweden" and "Nokia" from the string example. I could not find …

Install wxPython in osx 10.11

When I try to install wxPython, it shows an error: > The Installer could not install the software because there was no > software found to install.How can I fix it?

merging recurrent layers with dense layer in Keras

I want to build a neural network where the two first layers are feedforward and the last one is recurrent. here is my code :model = Sequential() model.add(Dense(150, input_dim=23,init=normal,activation…