Scrapy with selenium, webdriver failing to instantiate

2024/11/15 11:10:37

I am trying to use selenium/phantomjs with scrapy and I'm riddled with errors. For example, take the following code snippet:

def parse(self, resposne):while True:try:driver = webdriver.PhantomJS()# do some stuffdriver.quit()breakexcept (WebDriverException, TimeoutException):try:driver.quit()except UnboundLocalError:print "Driver failed to instantiate"time.sleep(3)continue

A lot of the times the driver it seems it has failed to instantiate (so the driver is unbound, hence the exception), and I get the blurb (along with the print message I put in)

Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.phantomjs.service.Service object at 0x7fbb28dc17d0>> ignored

Googling around, it seems everyone suggests updating phantomjs, which I have (1.9.8 built from source). Would anyone know what else could be causing this problem and a suitable diagnosis?

Answer

The reason for this behavior is how the PhantomJS driver's Service class is implemented.

There is a __del__ method defined that calls self.stop() method:

def __del__(self):# subprocess.Popen doesn't send signal on __del__;# we have to try to stop the launched process.self.stop()

And, self.stop() is assuming the service instance is still alive trying to access it's attributes:

def stop(self):"""Cleans up the process"""if self._log:self._log.close()self._log = None#If its dead dont worryif self.process is None:return...

The same exact problem is perfectly described in this thread:

  • Python attributeError on __del__

What you should do is to silently ignore AttributeError occurring while quitting the driver instance:

try:driver.quit()
except AttributeError:pass

The problem was introduced by this revision. Which means that downgrading to 2.40.0 would also help.

https://en.xdnf.cn/q/71473.html

Related Q&A

How do I enable TLS on an already connected Python asyncio stream?

I have a Python asyncio server written using the high-level Streams API. I want to enable TLS on an already established connection, as in STARTTLS in the SMTP and IMAP protocols. The asyncio event loop…

Validate with three xml schemas as one combined schema in lxml?

I am generating an XML document for which different XSDs have been provided for different parts (which is to say, definitions for some elements are in certain files, definitions for others are in other…

An unusual Python syntax element frequently used in Matplotlib

One proviso: The syntax element at the heart of my Question is in the Python language; however, this element appears frequently in the Matplotlib library, which is the only context i have seen it. So w…

Control the power of a usb port in Python

I was wondering if it could be possible to control the power of usb ports in Python, using vendor ids and product ids. It should be controlling powers instead of just enabling and disabling the ports. …

Threads and local proxy in Werkzeug. Usage

At first I want to make sure that I understand assignment of the feature correct. The local proxy functionality assigned to share a variables (objects) through modules (packages) within a thread. Am I …

Unable to use google-cloud in a GAE app

The following line in my Google App Engine app (webapp.py) fails to import the Google Cloud library:from google.cloud import storageWith the following error:ImportError: No module named google.cloud.st…

Multiple thermocouples on raspberry pi

I am pretty new to the GPIO part of the raspberry Pi. When I need pins I normally just use Arduino. However I would really like this project to be consolidated to one platform if possible, I would li…

Strange behaviour when mixing abstractmethod, classmethod and property decorators

Ive been trying to see whether one can create an abstract class property by mixing the three decorators (in Python 3.9.6, if that matters), and I noticed some strange behaviour. Consider the following …

Center the third subplot in the middle of second row python

I have a figure consisting of 3 subplots. I would like to locate the last subplot in the middle of the second row. Currently it is located in the left bottom of the figure. How do I do this? I cannot …

Removing columns which has only nan values from a NumPy array

I have a NumPy matrix like the one below:[[182 93 107 ..., nan nan -1][182 93 107 ..., nan nan -1][182 93 110 ..., nan nan -1]..., [188 95 112 ..., nan nan -1][188 97 115 ..., nan nan -1][188 95 112 ..…