How to detect changed and new items in an RSS feed?

2024/10/9 0:45:49

Using feedparser or some other Python library to download and parse RSS feeds; how can I reliably detect new items and modified items?

So far I have seen new items in feeds with publication dates earlier than the latest item. Also I have seen feed readers displaying the same item published with slightly different content as seperate items. I am not implementing a feed reader application, I just want a sane strategy for archiving feed data.

Answer

It depends on how much you trust the feed source. feedparser provides an .id attribute for feed items -- this attribute should be unique for both RSS and ATOM sources. For an example, see eg feedparser's ATOM docs. Though .id will cover most cases, it's conceivable that a source might publish multiple items with the same id. In that case, you don't have much choice but to hash the item's content.

https://en.xdnf.cn/q/70080.html

Related Q&A

python SharedMemory persistence between processes

Is there any way to make SharedMemory object created in Python persist between processes? If the following code is invoked in interactive python session: >>> from multiprocessing import share…

What is the difference between syntax error and runtime error?

For example:def tofloat(i): return flt(i)def addnums(numlist):total = 0for i in numlist:total += tofloat(i)return totalnums = [1 ,2 ,3] addnums(nums)The flt is supposed to be float, but Im confused whe…

Printing a line at the bottom of the console/terminal

Using Python, I would like to print a line that will appear on the last visible line on the console the script is being ran from. For example, something like this:Would this be able to be done?

Comparing first element of the consecutive lists of tuples in Python

I have a list of tuples, each containing two elements. The first element of few sublists is common. I want to compare the first element of these sublists and append the second element in one lists. Her…

Upload a file using boto

import boto conn = boto.connect_s3(, )mybucket = conn.get_bucket(data_report_321)I can download the file from a bucket using the following code.for b in mybucket:print b.nameb.get_contents_to_filename…

How to get n-gram collocations and association in python nltk?

In this documentation, there is example using nltk.collocations.BigramAssocMeasures(), BigramCollocationFinder,nltk.collocations.TrigramAssocMeasures(), and TrigramCollocationFinder.There is example me…

Using Python3 on macOS as default but pip still get using python 2.7

Im using macOS Big Sur 11.0.1. Im setting up a virtual env $python3 -m venv $my_workdir)/.virtualenvbut getting this error at building wheel package: building _openssl extensioncreating build/temp.maco…

Python Matplotlib Box Plot Two Data Sets Side by Side

I would like to make a boxplot using two data sets. Each set is a list of floats. A and B are examples of the two data setsA = [] B = []for i in xrange(10):l = [random.random() for i in xrange(100)]m =…

perform() and reset_actions() in ActionChains not working selenium python

This is the code that habe no error: perform() and reset_actions() but these two functions have to work combinedly import os import time from selenium import webdriver from selenium.webdriver.common.ac…

nosetests not recognized on Windows after being installed and added to PATH

Im on exercise 46 of Learn Python the Hard Way, and Im meant to install nose and run nosetests. Ive installed nose already using pip, but when I run nosetests in the directory above the tests folder, I…