How to generate DTD from XML?

2024/11/16 1:24:38

Can a DTD be generated from an XML file using Python?

Answer

The simple answer to the question you ask is "yes, a DTD can be generated from an XML document using Python".

Python is a Turing-complete language, and there are algorithms for generating a DTD from any arbitrary collection of XML or SGML. I believe the standard reference is Rick Kazman, "Structuring the text of the Oxford English Dictionary through finite state transduction," Centre for the New Oxford English Dictionary Tech. Report OED-86-20, Univ. of Waterloo (June 1986), 117 pp.

In the late 1980s, the library consortium OCLC developed a tool called Fred, which induced DTD for bodies of SGML documents; I heard a lot about it informally but do not recall ever seeing published descriptions of its algorithms. However, a quick search of the Web for "OCLC Fred SGML DTD" produces a pointer to Keith E. Shafer, Fred: the SGML Grammar Builder (1996). (A quick glance showed a great deal of material, but I did not see any clear reference to a high-level description of the algorithms used.)

There is also a Norwegian thesis from 1994: Sunniva M. K. Solstrand, "Automatisk generering av DTD fra SGML-kodet materiale", Hovedfagsoppgave i informasjonsvitenskap, Universitetet i Bergen 1994).

As may be seen, there are several computer scientists who do not agree with the commenters who have told you your question is pointless or wrong. It is true, of course, that the quality of document grammar achieved by automatic grammar induction tends to be lower than the quality of document grammar achieved by a human document analyst and DTD writer.

I suspect that the DTD generated would be more plausible if it restricted itself to the content-model patterns described in various articles by Fabio Vitali and his collaborators in Bologna. The initial paper was, I believe, Fabio Vitali, Angelo Di Iorio, and Daniele Gubellini, "Design patterns for descriptive document substructures", Extreme Markup Languages 2005, and later papers have elaborated and described applications. New work in Bologna by Francesco Poggi (not yet published) extends and deepens the analysis. A Web search for "XML design patterns" may provide other attempts at similar sets of grammatical patterns. From a grammar-induction point of view, the effect of such patterns is to reduce the complexity of the induction problem by targeting simpler grammars.


If you meant to ask the rather different question "Can anyone recommend a Python-based tool for generating a DTD from an XML document?", then I can't help you (and there are lots of Stack Overflow moderators who will close the question at once because questions asking for tool recommendations are frowned upon).

https://en.xdnf.cn/q/120389.html

Related Q&A

I have a very big list of dictionaries and I want to sum the insides

Something like{A: 3, 45, 34, 4, 2, 5, 94, 2139, 230345, 283047, 230847}, {B: 92374, 324, 345, 345, 45879, 34857987, 3457938457), {C: 23874923874987, 2347}How can I reduce that to {A: 2304923094820398},…

How to debug a TypeError no attribute __getitem__? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 10 years ago.Improv…

Change character based off of its position? Python 2.7

I have a string on unknown length, contain the characters a-z A-Z 0-9. I need to change each character using their position from Left to Right using a dictionary.Example:string = "aaaaaaaa" d…

Changing type using str() and int()...how it works

If I do this, I get:>>> x = 1 >>> y = 2 >>> type(x) <class int> >>> type(y) <class str>That all makes sense to me, except that if I convert using:>>…

Caesar cypher in Python

I am new to python and I want to write a program that asks for a line of input (each number will be separated by a single space.) It would use a simple letter-number substitution cipher. Each letter w…

Flatten a list with sublists and strings [duplicate]

This question already has answers here:Flatten an irregular (arbitrarily nested) list of lists(54 answers)How do I make a flat list out of a list of lists?(32 answers)Closed 2 years ago.How can I flat…

Guitar string code in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 1…

What is the point of initializing extensions with the flask instance in Flask? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 8 years ago.Improve…

Nested list doesnt work properly

import re def get_number(element):re_number = re.match("(\d+\.?\d*)", element)if re_number:return float(re_number.group(1))else:return 1.0def getvalues(equation):elements = re.findall("…

How can I increment array with loop?

Ive got two lists:listOne = [33.325556, 59.8149016457, 51.1289412359] listTwo = [2.5929778, 1.57945488999, 8.57262235411]I use len to tell me how many items are in my list:itemsInListOne = int(len(list…