Scrapy empty output

2024/9/20 9:38:55

I am trying to use Scrapy to extract data from page. But I get an empty output. What is the problem?

spider:

class Ratemds(scrapy.Spider):name = 'ratemds'allowed_domains = ['ratemds.com']custom_settings = {'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747',}def start_requests(self): yield scrapy.Request('https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md-greensboro-nc-us' , callback=self.profile)def profile(self, response):item =  {'url': response.request.url,'Image': response.css('.doctor-profile-image::attr(src)').get(),'First_and_Last_Name': response.css('h1::text').get()}yield item

output:

{'url': 'https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md-greensboro-nc-us', 'Image': None, 'First_and_Last_Name': None}
Answer

The problem is that this website has captcha protection. And when you try to collect information from it you are redirecting to the page, like this one: error_page

and as you can see this page not contains information which you are looking for. To collect information from such website you can try the following:

  1. Use scrapy-selenium/splash to collect information.
  2. use captcha solving tools like death-by-captcha , anticaptcha or similar.
https://en.xdnf.cn/q/119348.html

Related Q&A

nested classes - how to use function from parent class?

If I have this situation:class Foo(object):def __init__(self):self.bar = Bar()def do_something(self):print doing somethingclass Bar(object):def __init(self):self.a = adef some_function(self):I want to …

CUDA Function Wont Execute For Loop on Python with Numba

Im trying to run a simple update loop of a simulation on the GPU. Basically there are a bunch of "creatures" represented by circles that in each update loop will move and then there will be a…

Implementing the Ceaser Cipher function through input in Python

Im trying to create a Ceaser Cipher function in Python that shifts letters based off the input you put in.plainText = input("Secret message: ") shift = int(input("Shift: "))def caes…

Twitter scraping of older tweets

I am doing a project in which I needed to get tweets from twitter, and I used the twitter API but it only gives tweets from 7-9 days old but I want a few months older tweets as well. So I decided to sc…

Bootstrap Navbar Logo not found

Hello I am trying to get my NavBar on bootstrap to show a Logo, I have tried moving the png to different folders in the app but I get this error: System check identified no issues (0 silenced). January…

Why camelcase not installed?

i try to install camelcase in my python project. pip install camelcase but when i want to use the package, pylance give me this error: Import "camelcase" could not be resolved Pylance (report…

Find the two longest strings from a list || or the second longest list in PYTHON

Id like to know how i can find the two longest strings from a list(array) of strings or how to find the second longest string from a list. thanks

Which tensorflow-gpu version is compatible with Python 3.7.3

Actually, I am tired of getting "ImportError: DLL load failed" inWindows 10 CUDA Toolkit 10.0 (Sept 2018) Download cuDNN v7.6.0 (May 20, 2019) / v7.6.4 tensorflow-gpu==1.13.1 / 1.13.2 / 1.14 …

Find valid strings [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 6…

What is the difference in *args, **kwargs vs calling with tuple and dict? [duplicate]

This question already has answers here:What does ** (double star/asterisk) and * (star/asterisk) do for parameters?(28 answers)Closed 8 years ago.This is a basic question. Is there a difference in doi…