Parsing RSS with Elementtree in Python

2024/10/2 1:28:35

How do you search for namespace-specific tags in XML using Elementtree in Python?

I have an XML/RSS document like:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"xmlns:content="http://purl.org/rss/1.0/modules/content/"xmlns:wfw="http://wellformedweb.org/CommentAPI/"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:wp="http://wordpress.org/export/1.0/"
>
<channel><title>sometitle</title><pubDate>Tue, 28 Aug 2012 22:36:02 +0000</pubDate><generator>http://wordpress.org/?v=2.5.1</generator><language>en</language><wp:wxr_version>1.0</wp:wxr_version><wp:category><wp:category_nicename>apache</wp:category_nicename><wp:category_parent></wp:category_parent><wp:cat_name><![CDATA[Apache]]></wp:cat_name></wp:category>
</channel>
</rss>

But when I try and find all "wp:category" tags by doing:

import xml.etree.ElementTree as xml
tree = xml.parse(fn)
doc = tree.getroot()
categories = doc.findall('channel/wp:category')

I get the error:

SyntaxError: prefix 'wp' not found in prefix map

Searching for any non-namespace specific fields works just fine. What am I doing wrong?

Answer

You need to handle the namespace prefixes, either by using iterparse and handling the event directly or by explicitly declaring the prefixes you're interested in before parsing. Depending on what you're trying to do, I will admit in my lazier moments I just strip all the prefixes out with a string replace before parsing the XML.

EDIT: this similar question might help.

https://en.xdnf.cn/q/70900.html

Related Q&A

String module object has no attribute join

So, I want to create a user text input box in Pygame, and I was told to look at a class module called inputbox. So I downloaded inputbox.py and imported into my main game file. I then ran a function in…

TypeError: the JSON object must be str, not Response with Python 3.4

Im getting this error and I cant figure out what the problem is:Traceback (most recent call last):File "C:/Python34/Scripts/ddg.py", line 8, in <module>data = json.loads(r)File "C:…

Redirect while passing message in django

Im trying to run a redirect after I check to see if the user_settings exist for a user (if they dont exist - the user is taken to the form to input and save them).I want to redirect the user to the app…

Django sorting by date(day)

I want to sort models by day first and then by score, meaning Id like to see the the highest scoring Articles in each day. class Article(models.Model):date_modified = models.DateTimeField(blank=True, n…

ImportError: No module named pynotify. While the module is installed

So this error keeps coming back.Everytime I try to tun the script it returns saying:Traceback (most recent call last):File "cli.py", line 11, in <module>import pynotify ImportError: No …

business logic in Django

Id like to know where to put code that doesnt belong to a view, I mean, the logic.Ive been reading a few similar posts, but couldnt arrive to a conclusion. What I could understand is:A View is like a …

Faster alternatives to Pandas pivot_table

Im using Pandas pivot_table function on a large dataset (10 million rows, 6 columns). As execution time is paramount, I try to speed up the process. Currently it takes around 8 secs to process the whol…

How can I temporarily redirect the output of logging in Python?

Theres already a question that answers how to do this regarding sys.stdout and sys.stderr here: https://stackoverflow.com/a/14197079/198348 But that doesnt work everywhere. The logging module seems to …

trouble with creating a virtual environment in Windows 8, python 3.3

Im trying to create a virtual environment in Python, but I always get an error no matter how many times I re-install python-setuptools and pip. My computer is running Windows 8, and Im using Python 3.3…

Python imaplib search email with date and time

Im trying to read all emails from a particular date and time. mail = imaplib.IMAP4_SSL(self.url, self.port) mail.login(user, password) mail.select(self.folder) since = datetime.strftime(since, %d-%b-%Y…