Validate with three xml schemas as one combined schema in lxml?

2024/11/15 11:17:56

I am generating an XML document for which different XSDs have been provided for different parts (which is to say, definitions for some elements are in certain files, definitions for others are in others).

The XSD files do not refer to each other. The schemas are:

  1. http://xmlgw.companieshouse.gov.uk/v2-1/schema/Egov_ch-v2-0.xsd
  2. http://xmlgw.companieshouse.gov.uk/v1-1/schema/forms/FormSubmission-v1-1.xsd
  3. http://xmlgw.companieshouse.gov.uk/v1-1/schema/forms/CompanyIncorporation-v1-2.xsd

Is there a way to validate the document against all of the schemas using lxml?

The solution here is not simply to validate individually against each schema, because the problem I am having is that validation fails because of elements not specified in the XSD. For example, when validating against http://xmlgw.companieshouse.gov.uk/v2-1/schema/Egov_ch-v2-0.xsd, I get the error:

  File "lxml.etree.pyx", line 3006, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:125415)
DocumentInvalid: Element '{http://xmlgw.companieshouse.gov.uk}CompanyIncorporation': No matching global element declaration available, but demanded by the strict wildcard., line 9

Because the document in question contains a {http://xmlgw.companieshouse.gov.uk}CompanyIncorporation element, which is not specified in the XSD being validated against, but in one of the other XSD files.

Answer

I believe you should only be validating against Egov_ch-v2-0.xsd, which appears to define an envelope document. (This is the document you are creating, right? You haven't showed your XML.)

This schema uses <xs:any namespace="##any" minOccurs="0"/> to define body contents of the envelope. However, xsd:any does not mean "ignore all contents." Rather it means "accept anything here." Whether to validate or ignore the contents is controlled by the processContents attribute, which defaults to strict. This means that any elements discovered here must validate against types available to the schema. However, Egov_ch-v2-0.xsd does not import CompanyIncorporation-v1-2.xsd, so it doesn't know about the CompanyIncorporation element, so the document does not validate.

You need to add xsd:import elements to your main schema (Egov_ch-v2-0.xsd) to import all other schemas that may be used in the document. You can either do this in the xsd file itself, or you can add the elements programmatically after parsing:

xsd = lxml.etree.parse('http://xmlgw.companieshouse.gov.uk/v2-1/schema/Egov_ch-v2-0.xsd')
newimport = lxml.etree.Element('{http://www.w3.org/2001/XMLSchema}import',namespace="http://xmlgw.companieshouse.gov.uk",schemaLocation="http://xmlgw.companieshouse.gov.uk/v1-1/schema/forms/CompanyIncorporation-v1-2.xsd")
xsd.getroot().append(newimport)validator = lxml.etree.XMLSchema(xsd)

You can even do this in a generic way with a function that takes a list of schema paths and returns a list of xsd:import statements with namespace and schemaLocation set by parsing targetNamespace.

(As an aside, you should probably download these schema documents and reference them with filesystem paths rather than load them over the network.)

https://en.xdnf.cn/q/71471.html

Related Q&A

An unusual Python syntax element frequently used in Matplotlib

One proviso: The syntax element at the heart of my Question is in the Python language; however, this element appears frequently in the Matplotlib library, which is the only context i have seen it. So w…

Control the power of a usb port in Python

I was wondering if it could be possible to control the power of usb ports in Python, using vendor ids and product ids. It should be controlling powers instead of just enabling and disabling the ports. …

Threads and local proxy in Werkzeug. Usage

At first I want to make sure that I understand assignment of the feature correct. The local proxy functionality assigned to share a variables (objects) through modules (packages) within a thread. Am I …

Unable to use google-cloud in a GAE app

The following line in my Google App Engine app (webapp.py) fails to import the Google Cloud library:from google.cloud import storageWith the following error:ImportError: No module named google.cloud.st…

Multiple thermocouples on raspberry pi

I am pretty new to the GPIO part of the raspberry Pi. When I need pins I normally just use Arduino. However I would really like this project to be consolidated to one platform if possible, I would li…

Strange behaviour when mixing abstractmethod, classmethod and property decorators

Ive been trying to see whether one can create an abstract class property by mixing the three decorators (in Python 3.9.6, if that matters), and I noticed some strange behaviour. Consider the following …

Center the third subplot in the middle of second row python

I have a figure consisting of 3 subplots. I would like to locate the last subplot in the middle of the second row. Currently it is located in the left bottom of the figure. How do I do this? I cannot …

Removing columns which has only nan values from a NumPy array

I have a NumPy matrix like the one below:[[182 93 107 ..., nan nan -1][182 93 107 ..., nan nan -1][182 93 110 ..., nan nan -1]..., [188 95 112 ..., nan nan -1][188 97 115 ..., nan nan -1][188 95 112 ..…

how to get kubectl configuration from azure aks with python?

I create a k8s deployment script with python, and to get the configuration from kubectl, I use the python command:from kubernetes import client, configconfig.load_kube_config()to get the azure aks conf…

Object vs. Dictionary: how to organise a data tree?

I am programming some kind of simulation with its data organised in a tree. The main object is World which holds a bunch of methods and a list of City objects. Each City object in turn has a bunch of m…