Scrapy. How to change spider settings after start crawling?

2024/10/6 18:29:07

I can't change spider settings in parse method. But it is definitely must be a way.

For example:

class SomeSpider(BaseSpider):name = 'mySpider'allowed_domains = ['example.com']start_urls = ['http://example.com']settings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.FirstPipeline']print settings['ITEM_PIPELINES'][0]#printed 'myproject.pipelines.FirstPipeline'def parse(self, response):#...some codesettings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.SecondPipeline']print settings['ITEM_PIPELINES'][0]# printed 'myproject.pipelines.SecondPipeline'item = Myitem()item['mame'] = 'Name for SecondPipeline'  

But! Item will be processed by FirstPipeline. New ITEM_PIPELINES param don't work. How can I change settings after start crawling? Thanks in advance!

Answer

If you want that different spiders to have different pipelines you can set for a spider a pipelines list attribute which defines the pipelines for that spider. Than in pipelines check for existence:

class MyPipeline(object):def process_item(self, item, spider):if self.__class__.__name__ not in getattr(spider, 'pipelines',[]):return item...return itemclass MySpider(CrawlSpider):pipelines = set(['MyPipeline','MyPipeline3',])

If you want that different items to be proceesed by different pipelines you can do this:

    class MyPipeline2(object):def process_item(self, item, spider):if isinstance(item, MyItem):...return itemreturn item
https://en.xdnf.cn/q/70332.html

Related Q&A

numpy ctypes dynamic module does not define init function error if not recompiled each time

sorry for yet an other question about dynamic module does not define init function. I did go through older questions but I didnt find one which adress my case specifically enought.I have a C++ library …

How do I save Excel Sheet as HTML in Python?

Im working with this library XlsxWriter.Ive opened a workbook and written some stuff in it (considering the official example) - import xlsxwriter# Create a workbook and add a worksheet. workbook = xlsx…

Faster sockets in Python

I have a client written in Python for a server, which functions through LAN. Some part of the algorithm uses socket reading intensively and it is executing about 3-6 times slower, than almost the same …

Python gmail api send email with attachment pdf all blank

I am using python 3.5 and below code is mostly from the google api page... https://developers.google.com/gmail/api/guides/sending slightly revised for python 3.xi could successfully send out the email …

how to find height and width of image for FileField Django

How to find height and width of image if our model is defined as followclass MModel:document = FileField()format_type = CharField()and image is saved in document then how we can find height and width o…

Given a pickle dump in python how to I determine the used protocol?

Assume that I have a pickle dump - either as a file or just as a string - how can I determine the protocol that was used to create the pickle dump automatically? And if so, do I need to read the entir…

Get First element by the recent date of each group

I have following model in djangoBusiness ID Business Name Business Revenue DateHere is the sample data:Business ID | Business Name | Business Revenue | Date 1 B1 1000 …

remote: ImportError: No module named gitlab

I wrote gitlab hook with python. And added to post-receive hooks in gitlab server. When i push to remote origin server from my laptop, i get following error. But it works when i run script manually in …

Using an Access database (.mdb) with Python on Ubuntu [duplicate]

This question already has answers here:Working with an Access database in Python on non-Windows platform (Linux or Mac)(5 answers)Closed 7 years ago.Im trying to use pyodbc to access a .mdb on Ubuntu. …

Pandas Grouper by weekday?

I have a pandas dataframe where the index is the date, from year 2007 to 2017.Id like to calculate the mean of each weekday for each year. I am able to group by year: groups = df.groupby(TimeGrouper(A)…