feedparser – Made by Mikal

Universal Feedparser and XML namespaces

Post author:mikal
Post published:July 10, 2008
Post category:Python

I've always found python's Universal Feedparser to be a bit hard to work with when using feeds with XML namespaces. Specifically, if you don't care about the stuff in the namespaces then you're fine, but if you want that data it gets a lot harder. In the past I've had to do some gross hacks. For example this gem is from the MythNetTV code: # Modify the XML to work around namespace handling bugs in FeedParser lines = [] re_mediacontent = re.compile('(.*)<media:content([^>]*)/ *>(.*)') for line in xmllines: m = re_mediacontent.match(line) count = 1 while m: line = '%s<media:wannabe%d>%s</media:wannabe%d>%s' %(m.group(1), count, m.group(2), count, m.group(3)) m = re_mediacontent.match(line) count = count + 1 lines.append(line) # Parse the modified XML xml = ''.join(lines) parser = feedparser.parse(xml) Which is horrible, but works. This time around the problem is that I am having trouble getting to the gr:annotation tags in my Google reader shared items feed. How annoying. In the case of the Google reader feed, the problem seems to be that the annotation is presented like this: <gr:annotation><content type="html">Awesome. Canberra has needed something better than buses between the towncenters for a while, and light rail seems like a great way to do it. I…