Given the simple XML data below:
<book><title>My First Book</title><abstract><para>First paragraph of the abstract</para><para>Second paragraph of the abstract</para></abstract><keywordSet><keyword>First keyword</keyword><keyword>Second keyword</keyword><keyword>Third keyword</keyword></keywordSet>
</book>
How can I traverse the tree, using lxml, and get all paragraphs in the "abstract" element, as well as all keywords in the "keywordSet" element?
The code snippet below returns only the first line of text in each element:
from lxml import objectify
root = objectify.fromstring(xml_string) # xml_string contains the XML data above
print root.title # returns the book title
for line in root.abstract:print line.para # returns only yhe first paragraph
for word in root.keywordSet:print word.keyword # returns only the first keyword in the set
I tried to follow this example, but the code above doesn't work as expected.
On a different tack, still better would be able to read the entire XML tree into a Python dictionary, with each element as the key and each text as the element item(s). I found out that something like this might be possible using lxml objectify, but I couldn't figure out how to achieve it.
One really big problem I have been finding when attempting to write XML parsing code in Python is that most of the "examples" provided are just too simple and entirely fictitious to be of much help -- or else they are just the opposite, using too complicated automatically-generated XML data!
Could anybody give me a hint?
Thanks in advance!
EDIT: After posting this question, I found a simple solution here.
So, my updated code becomes:
from lxml import objectifyroot = objectify.fromstring(xml_string) # xml_string contains the XML data aboveprint root.title # returns the book titlefor para in root.abstract.iterchildren():print para # now returns the text of all paragraphsfor keyword in root.keywordSet.iterchildren():print keyword # now returns all keywords in the set