scrapy: exceptions.AttributeError: unicode object has no attribute dont_filter

2024/11/17 15:56:55

In scrapy, I am getting the error exceptions.AttributeError: 'unicode' object has no attribute 'dont_filter'. After searching around, I found this answer (which made sense as it was the only bit of code I modified before getting the error) according to which I modified my code. I changed start_request to yield values in the list instead of retruning it whole but I'm still getting it. Any ideas?

def start_requests(self):connection = pymongo.Connection(settings['MONGODB_SERVER'],settings['MONGODB_PORT'])db = connection[settings['MONGODB_DB']]collection = db[settings['MONGODB_COLLECTION']]for el in [i['url'] for i in collection.find({}, {'_id':0, 'url':1})]:yield el

I have checked the other parts of the code to affirm that everything else is fine.

Traceback:

[-] Unhandled ErrorTraceback (most recent call last):File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 93, in startself.start_reactor()File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 130, in start_reactorreactor.run(installSignalHandlers=False)  # blocking callFile "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1192, in runself.mainLoop()File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoopself.runUntilCurrent()--- <exception caught here> ---File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrentcall.func(*call.args, **call.kw)File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/utils/reactor.py", line 41, in __call__return self._func(*self._a, **self._kw)File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 120, in _next_requestself.crawl(request, spider)File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 176, in crawlself.schedule(request, spider)File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 182, in schedulereturn self.slot.scheduler.enqueue_request(request)File "/home/myName/scrapy-test/venv/local/lib/python2.7/site-packages/scrapy/core/scheduler.py", line 48, in enqueue_requestif not request.dont_filter and self.df.request_seen(request):exceptions.AttributeError: 'unicode' object has no attribute 'dont_filter'
Answer

start_requests is supposed to yield individual Request objects, not just individual URLs. But each el in your code is apparently a URL. Try changing

yield el

to

yield self.make_requests_from_url(el)

(see the question you link to for an example of this)

https://en.xdnf.cn/q/71204.html

Related Q&A

Django Class Based View: Validate object in dispatch

Is there a established way that i validate an object in the dispatch without making an extra database call when self.get_object() is called later in get/post?Here is what i have so far (slightly alter…

Python very slow as compared to Java for this algorithm

Im studying algorithms and decided to port the Java Programs from the textbook to Python, since I dislike the Java overhead, especially for small programs, and as an exercise.The algorithm itself is ve…

fd.seek() IOError: [Errno 22] Invalid argument

My Python Interpreter (v2.6.5) raises the above error in the following codepart:fd = open("some_filename", "r") fd.seek(-2, os.SEEK_END) #same happens if you exchange the second arg…

When is a variable considered constant in terms of PEP8 naming styles?

In keeping with PEP8 conventions, in a .py I can define constants as:NAME = "Me" AGE = "Old" GENER = "Male"If a .txt contained Me Old Male on a single line, and in anothe…

Average Date Array Calculation

I would like to get the mean of the following dates. I thought about converting all the data to seconds and then averaging them. But there is probably a better way to do it.date = [2016-02-23 09:36:26,…

DRF: Serializer Group By Model Field

I want my api to return Account objects grouped by the account_type field in the model. I want to do this so that the grouped accounts are easier to access in my JS code. This is what is returned now:[…

Need help combining two 3 channel images into 6 channel image Python

I am trying to combine two different RGB images into a single 6 channel image (Tiff is best) using nothing but Python.What I have is an RGB image taken from a normal camera as well as another RGB image…

Komodo Edit disable autocomple

I am using Komodo Edit 8 and its autocomplete feature is totally annoying. As soon as I type "for i" , it autofills in this:for i in range:codeNow i have to delete it manually to continue typ…

Python WeakKeyDictionary for unhashable types

As raised in cpython issue 88306, python WeakKeyDictionary fails for non hashable types. According to the discussion in the python issue above, this is an unnecessary restriction, using ids of the keys…

Django MPTT efficiently serializing relational data with DRF

I have a Category model that is a MPTT model. It is m2m to Group and I need to serialize the tree with related counts, imagine my Category tree is this:Root (related to 1 group)- Branch (related to 2 g…