Schematron validation with lxml in Python: how to retrieve validation errors?

2024/9/8 8:49:29

I'm trying to do some Schematron validation with lxml. For the specific application I'm working at, it's important that any tests that failed the validation are reported back. The lxml documentation mentions the presence of the validation_report property object. I think this should contain the info I'm looking for, but I just can't figure out how work with it. Here's some example code that demonstrates my problem (adapted from http://lxml.de/validation.html#id2; tested with Python 2.7.4):

import StringIO
from lxml import isoschematron
from lxml import etreedef main():# Schemaf = StringIO.StringIO('''\<schema xmlns="http://purl.oclc.org/dsdl/schematron" ><pattern id="sum_equals_100_percent"><title>Sum equals 100%.</title><rule context="Total"><assert test="sum(//Percent)=100">Sum is not 100%.</assert></rule></pattern></schema>''')# Parse schemasct_doc = etree.parse(f)schematron = isoschematron.Schematron(sct_doc, store_report = True)# XML to validate - validation will fail because sum of numbers# not equal to 100 notValid = StringIO.StringIO('''\<Total><Percent>30</Percent><Percent>30</Percent><Percent>50</Percent></Total>''')# Parse xmldoc = etree.parse(notValid)# Validate against schemavalidationResult = schematron.validate(doc)# Validation report (assuming here this is where reason # for validation failure is stored, but perhaps I'm wrong?)report = isoschematron.Schematron.validation_reportprint("is valid: " + str(validationResult))print(dir(report.__doc__))main()

Now, from the value of validationResult I can see that the validation failed (as expected), so next I would like to know why. The result of the second print statement gives me:

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__
format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__get
slice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mo
d__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__','__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook
__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center','count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index
', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', '
rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', '
strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Which is about as far as I'm getting, based on the documentation and this related question. Could well be something really obvious I'm overlooking?

Answer

OK, so someone on Twitter gave me a suggestion which made me realise that I mistakenly got the reference to the schematron class all wrong. Since there don't seem to be any clear examples, I'll share my working solution below:

import StringIO
from lxml import isoschematron
from lxml import etreedef main():# Example adapted from http://lxml.de/validation.html#id2# Schemaf = StringIO.StringIO('''\<schema xmlns="http://purl.oclc.org/dsdl/schematron" ><pattern id="sum_equals_100_percent"><title>Sum equals 100%.</title><rule context="Total"><assert test="sum(//Percent)=100">Sum is not 100%.</assert></rule></pattern></schema>''')# Parse schemasct_doc = etree.parse(f)schematron = isoschematron.Schematron(sct_doc, store_report = True)# XML to validate - validation will fail because sum of numbers # not equal to 100 notValid = StringIO.StringIO('''\<Total><Percent>30</Percent><Percent>30</Percent><Percent>50</Percent></Total>''')# Parse xmldoc = etree.parse(notValid)# Validate against schemavalidationResult = schematron.validate(doc)# Validation report report = schematron.validation_reportprint("is valid: " + str(validationResult))print(type(report))print(report)main()

The print statement on the report now results in the following output:

 <?xml version="1.0" standalone="yes"?>
<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" title="" schemaVersion=""><!--   --><svrl:active-pattern id="sum_equals_100_percent" name="Sum equals 100%."/><svrl:fired-rule context="Total"/><svrl:failed-assert test="sum(//Percent)=100" location="/Total"><svrl:text>Sum is not 100%.</svrl:text></svrl:failed-assert>
</svrl:schematron-output>

Which is exactly what I was looking for!

https://en.xdnf.cn/q/72243.html

Related Q&A

Getting Query Parameters as Dictionary in FastAPI [duplicate]

This question already has answers here:How to get query params including keys with blank values using FastAPI?(2 answers)Closed 6 months ago.I spent last month learning Flask, and am now moving on to …

Python Generated Signature for S3 Post

I think Ive read nearly everything there is to read on base-64 encoding of a signature for in-browser, form-based post to S3: old docs and new docs. For instance:http://doc.s3.amazonaws.com/proposals/…

Bringing a classifier to production

Ive saved my classifier pipeline using joblib: vec = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3)) pac_clf = PassiveAggressiveClassifier(C=1) vec_clf = Pipeline([(vectorizer, vec)…

how to count the frequency of letters in text excluding whitespace and numbers? [duplicate]

This question already has answers here:Using a dictionary to count the items in a list(10 answers)Closed last year.Use a dictionary to count the frequency of letters in the input string. Only letters s…

Fastest algorithm for finding overlap between two very large lists?

Im trying to build an algorithm in Python to filter a large block of RDF data. I have one list consisting of about 70 thousand items formatted like <"datum">.I then have about 6GB worth…

Call Postgres SQL stored procedure From Django

I am working on a Django Project with a Postgres SQL Database. I have written a stored procedure that runs perfectly on Postgres.Now I want to call that stored procedure from Django 1.5 .. I have writt…

How can I mix decorators with the @contextmanager decorator?

Here is the code Im working with:from contextlib import contextmanager from functools import wraps class with_report_status(object):def __init__(self, message):self.message = messagedef __call__(self, …

supervisord always returns exit status 127 at WebFaction

I keep getting the following errors from supervisord at webFaction when tailing the log:INFO exited: my_app (exit status 127; not expected) INFO gave up: my_app entered FATAL state, too many start retr…

One dimensional Mahalanobis Distance in Python

Ive been trying to validate my code to calculate Mahalanobis distance written in Python (and double check to compare the result in OpenCV) My data points are of 1 dimension each (5 rows x 1 column). I…

DeprecationWarning: please use dns.resolver.Resolver.resolve()

I am using resolver() as an alternative to socket() as I found that when multiple connections are made to different IPs it ends up stopping working. Anyway it returns a warning to me that I should use …