Parsing RSS with Elementtree in Python

2024/10/2 1:28:35

How do you search for namespace-specific tags in XML using Elementtree in Python?

I have an XML/RSS document like:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"xmlns:content=""xmlns:wfw=""xmlns:dc=""xmlns:wp=""
<channel><title>sometitle</title><pubDate>Tue, 28 Aug 2012 22:36:02 +0000</pubDate><generator></generator><language>en</language><wp:wxr_version>1.0</wp:wxr_version><wp:category><wp:category_nicename>apache</wp:category_nicename><wp:category_parent></wp:category_parent><wp:cat_name><![CDATA[Apache]]></wp:cat_name></wp:category>

But when I try and find all "wp:category" tags by doing:

import xml.etree.ElementTree as xml
tree = xml.parse(fn)
doc = tree.getroot()
categories = doc.findall('channel/wp:category')

I get the error:

SyntaxError: prefix 'wp' not found in prefix map

Searching for any non-namespace specific fields works just fine. What am I doing wrong?


You need to handle the namespace prefixes, either by using iterparse and handling the event directly or by explicitly declaring the prefixes you're interested in before parsing. Depending on what you're trying to do, I will admit in my lazier moments I just strip all the prefixes out with a string replace before parsing the XML.

EDIT: this similar question might help.

