Schematron validation with lxml in Python: how to retrieve validation errors?

2024/9/8 8:49:29

I'm trying to do some Schematron validation with lxml. For the specific application I'm working at, it's important that any tests that failed the validation are reported back. The lxml documentation mentions the presence of the validation_report property object. I think this should contain the info I'm looking for, but I just can't figure out how work with it. Here's some example code that demonstrates my problem (adapted from; tested with Python 2.7.4):

import StringIO
from lxml import isoschematron
from lxml import etreedef main():# Schemaf = StringIO.StringIO('''\<schema xmlns="" ><pattern id="sum_equals_100_percent"><title>Sum equals 100%.</title><rule context="Total"><assert test="sum(//Percent)=100">Sum is not 100%.</assert></rule></pattern></schema>''')# Parse schemasct_doc = etree.parse(f)schematron = isoschematron.Schematron(sct_doc, store_report = True)# XML to validate - validation will fail because sum of numbers# not equal to 100 notValid = StringIO.StringIO('''\<Total><Percent>30</Percent><Percent>30</Percent><Percent>50</Percent></Total>''')# Parse xmldoc = etree.parse(notValid)# Validate against schemavalidationResult = schematron.validate(doc)# Validation report (assuming here this is where reason # for validation failure is stored, but perhaps I'm wrong?)report = isoschematron.Schematron.validation_reportprint("is valid: " + str(validationResult))print(dir(report.__doc__))main()

Now, from the value of validationResult I can see that the validation failed (as expected), so next I would like to know why. The result of the second print statement gives me:

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__
format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__get
slice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mo
d__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__','__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook
__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center','count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index
', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', '
rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', '
strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Which is about as far as I'm getting, based on the documentation and this related question. Could well be something really obvious I'm overlooking?


OK, so someone on Twitter gave me a suggestion which made me realise that I mistakenly got the reference to the schematron class all wrong. Since there don't seem to be any clear examples, I'll share my working solution below:

import StringIO
from lxml import isoschematron
from lxml import etreedef main():# Example adapted from Schemaf = StringIO.StringIO('''\<schema xmlns="" ><pattern id="sum_equals_100_percent"><title>Sum equals 100%.</title><rule context="Total"><assert test="sum(//Percent)=100">Sum is not 100%.</assert></rule></pattern></schema>''')# Parse schemasct_doc = etree.parse(f)schematron = isoschematron.Schematron(sct_doc, store_report = True)# XML to validate - validation will fail because sum of numbers # not equal to 100 notValid = StringIO.StringIO('''\<Total><Percent>30</Percent><Percent>30</Percent><Percent>50</Percent></Total>''')# Parse xmldoc = etree.parse(notValid)# Validate against schemavalidationResult = schematron.validate(doc)# Validation report report = schematron.validation_reportprint("is valid: " + str(validationResult))print(type(report))print(report)main()

The print statement on the report now results in the following output:

 <?xml version="1.0" standalone="yes"?>
<svrl:schematron-output xmlns:svrl="" xmlns:xs="" xmlns:schold="" xmlns:sch="" xmlns:iso="" title="" schemaVersion=""><!--   --><svrl:active-pattern id="sum_equals_100_percent" name="Sum equals 100%."/><svrl:fired-rule context="Total"/><svrl:failed-assert test="sum(//Percent)=100" location="/Total"><svrl:text>Sum is not 100%.</svrl:text></svrl:failed-assert>

Which is exactly what I was looking for!

