I'm trying to make a desktop notifier, and for that I'm scraping news from a site. When I run the program, I get the following error.
news[child.tag] = child.encode('utf8')
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
How do I resolve it? I'm completely new to this. I tried searching for solutions, but none of them worked for me.
Here is my code:
import requests
import xml.etree.ElementTree as ET# url of news rss feed
RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"def loadRSS():'''utility function to load RSS feed'''# create HTTP request response objectresp = requests.get(RSS_FEED_URL)# return response contentreturn resp.contentdef parseXML(rss):'''utility function to parse XML format rss feed'''# create element tree root objectroot = ET.fromstring(rss)# create empty list for news itemsnewsitems = []# iterate news itemsfor item in root.findall('./channel/item'):news = {}# iterate child elements of itemfor child in item:# special checking for namespace object content:mediaif child.tag == '{http://search.yahoo.com/mrss/}content':news['media'] = child.attrib['url']else:news[child.tag] = child.encode('utf8')newsitems.append(news)# return news items listreturn newsitemsdef topStories():'''main function to generate and return news items'''# load rss feedrss = loadRSS()# parse XMLnewsitems = parseXML(rss)return newsitems
You're trying to convert a str
to bytes
, and then store those bytes in a dictionary.
The problem is that the object you're doing this to is an
xml.etree.ElementTree.Element
,
not a str
.
You probably meant to get the text from within or around that element, and then encode()
that.
The docs
suggests using the
itertext()
method:
''.join(child.itertext())
This will evaluate to a str
, which you can then encode()
.
Note that the
text
and tail
attributes
might not contain text
(emphasis added):
Their values are usually strings but may be any application-specific object.
If you want to use those attributes, you'll have to handle None
or non-string values:
head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...
Even this is not really enough.
If text
or tail
contain bytes
objects of some unexpected
(or plain wrong)
encoding, this will raise a UnicodeEncodeError
.
Strings versus Bytes
I suggest leaving the text as a str
, and not encoding it at all.
Encoding text to a bytes
object is intended as the last step before writing it to a binary file, a network socket, or some other hardware.
For more on the difference between bytes and characters, see Ned Batchelder's
"Pragmatic Unicode, or, How Do I Stop the Pain?"
(36 minute video from PyCon US 2012).
He covers both Python 2 and 3.
Example Output
Using the child.itertext()
method, and not encoding the strings, I got a reasonable-looking list-of-dictionaries from topStories()
:
[...,{'description': 'Ayushmann Khurrana says his five-year Bollywood journey has ''been “a fun ride”; adds success is a lousy teacher while ''failure is “your friend, philosopher and guide”.','guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html','link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html','media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG','pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ','title': "I am a hardcore realist, and that's why I feel my journey "'has been a joyride: Ayushmann...'},
]