Add a delay to a specific scrapy Request

2024/10/11 12:21:39

Is it possible to delay the retry of a particular scrapy Request. I have a middleware which needs to defer the request of a page until a later time. I know how to do the basic deferal (end of queue), and also how to delay all requests (global settings), but I want to just delay this one individual request. This is most important near the end of the queue, where if I do the simple deferral it immediately becomes the next request again.

Answer

Method 1

One way would be to add a middleware to your Spider (source, linked):

# File: middlewares.pyfrom twisted.internet import reactor
from twisted.internet.defer import Deferredclass DelayedRequestsMiddleware(object):def process_request(self, request, spider):delay_s = request.meta.get('delay_request_by', None)if not delay_s:returndeferred = Deferred()reactor.callLater(delay_s, deferred.callback, None)return deferred

Which you could later use in your Spider like this:

import scrapyclass QuotesSpider(scrapy.Spider):name = "quotes"custom_settings = {'DOWNLOADER_MIDDLEWARES': {'middlewares.DelayedRequestsMiddleware': 123},}def start_requests(self):# This request will have itself delayed by 5 secondsyield scrapy.Request(url='http://quotes.toscrape.com/page/1/', meta={'delay_request_by': 5})# This request will not be delayedyield scrapy.Request(url='http://quotes.toscrape.com/page/2/')def parse(self, response):...  # Process results here

Method 2

You could do this with a Custom Retry Middleware (source), you just need to override the process_response method of the current Retry Middleware:

from scrapy.downloadermiddlewares.retry import RetryMiddleware
from scrapy.utils.response import response_status_messageclass CustomRetryMiddleware(RetryMiddleware):def process_response(self, request, response, spider):if request.meta.get('dont_retry', False):return responseif response.status in self.retry_http_codes:reason = response_status_message(response.status)# Your delay code here, for example sleep(10) or polling server until it is alivereturn self._retry(request, reason, spider) or responsereturn response

Then enable it instead of the default RetryMiddleware in settings.py:

DOWNLOADER_MIDDLEWARES = {'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,'myproject.middlewarefilepath.CustomRetryMiddleware': 550,
}
https://en.xdnf.cn/q/69772.html

Related Q&A

importing without executing the class - python

my problem is about i have a file that contain class and inside this class there is bunch of code will be executed so whenever i import that file it will executed ! without creating an object of the…

If a command line program is unsure of stdouts encoding, what encoding should it output?

I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another…

How to generate JSON-API data attribute vs results attribute in Django Rest Framework JSON API?

I have a django 1.9.2 project using Django Rest Framework JSON API:https://github.com/django-json-api/django-rest-framework-json-api:My viewset looks like this:class QuestionViewSet(viewsets.ReadOnlyMo…

How connect my GoPro Hero 4 camera live stream to openCV using Python?

I m having troubles trying to capture a live stream from my new GoPro Hero 4 camera and do some image processing on it using openCV.Here is my trial (nothing shows up on the created windowimport cv2 im…

I thought Python passed everything by reference?

Take the following code#module functions.py def foo(input, new_val):input = new_val#module main.py input = 5 functions.foo(input, 10)print inputI thought input would now be 10. Why is this not the cas…

Python: -mno -cygwin

im trying to learn a lot of python on windows and that includes installing several packages, however everytime i invoke python setup.py install i have a problem with -mno -cygwin for gcc. ive have rea…

Django Sites Framework initial setup

Im comfortable with fairly one-dimensional Django implementations, but now trying to understand the multi-sites-with-shared-stuff process. Ive read through the Django Sites Framework and many posts o…

Data corruption: Wheres the bug‽

Last edit: Ive figured out what the problem was (see my own answer below) but I cannot mark the question as answered, it would seem. If someone can answer the questions I have in my answer below, name…

Python NetworkX — set node color automatically based on a list of values

I generated a graph with networkx import networkx as nx s = 5 G = nx.grid_graph(dim=[s,s]) nodes = list(G.nodes) edges = list(G.edges) p = [] for i in range(0, s):for j in range(0, s):p.append([i,j])…

control wspace for matplotlib subplots

I was wondering: I have a 1 row, 4 column plot. However, the first three subplots share the same yaxes extent (i.e. they have the same range and represent the same thing). The forth does not. What I w…