Calling Scrapy Spider from Django

2024/10/11 8:30:54

I have a project with a django and scrapy folder in the same workspace:

my_project/django_project/django_project/settings.pyapp1/app2/manage.py...scrapy_project/scrapy_project/settings.pyscrapy.cfg...

I've already connected scrapy with my django app1 model so every time I run my spider, it stores the collected data in my postgresql db. This is how my scrapy project can access to the django model

#in my_project/scrapy_project/scrapy_project/settings.py
import sys
import os
import djangosys.path.append('/../../django_project')
os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings'
django.setup()

Everything works great when I call the spider from the command line, but when I wanted to call the spider as a script from a django view or a Celery task in django, for example:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()

I get an Error:

KeyError: 'Spider not found: spider_name'

I think I'm suppose to tell Django where is Scrapy located (as I've done in scrapy settings), but I don't know how. To be honest, I'm not even sure that how I design my folder structure for this project is the correct choice.

Answer

follow example from scrapy doc:

from my_projec.scrapy_project.spiders import MySpider
...
process.crawl(MySpider)
https://en.xdnf.cn/q/69796.html

Related Q&A

Python Threading: Multiple While True loops

Do you guys have any recommendations on what python modules to use for the following application: I would like to create a daemon which runs 2 threads, both with while True: loops. Any examples would b…

Visual Studio Code - input function in Python

I am trying out Visual Studio Code, to learn Python.I am writing a starter piece of code to just take an input from the user, say:S = input("Whats your name? ")When I try to run this (Mac: C…

DRF: how to change the value of the model fields before saving to the database

If I need to change some field values before saving to the database as I think models method clear() is suitable. But I cant call him despite all my efforts.For example fields email I need set to lowe…

keep matplotlib / pyplot windows open after code termination

Id like python to make a plot, display it without blocking the control flow, and leave the plot open after the code exits. Is this possible?This, and related subjects exist (see below) in numerous ot…

socket python : recvfrom

I would like to know if socket.recvfrom in python is a blocking function ? I couldnt find my answer in the documentation If it isnt, what will be return if nothing is receive ? An empty string ? In…

pandas read_excel(sheet name = None) returns a dictionary of strings, not dataframes?

The pandas read_excel documentation says that specifying sheet_name = None should return "All sheets as a dictionary of DataFrames". However when I try to use it like so I get a dictionary of…

Plotly: How to assign specific colors for categories? [duplicate]

This question already has an answer here:How to define colors in a figure using Plotly Graph Objects and Plotly Express(1 answer)Closed 2 years ago.I have a pandas dataframe of electricity generation m…

Using nested asyncio.gather() inside another asyncio.gather()

I have a class with various methods. I have a method in that class something like :class MyClass:async def master_method(self):tasks = [self.sub_method() for _ in range(10)]results = await asyncio.gath…

AttributeError: type object Word2Vec has no attribute load_word2vec_format

I am trying to implement word2vec model and getting Attribute error AttributeError: type object Word2Vec has no attribute load_word2vec_formatBelow is the code :wv = Word2Vec.load_word2vec_format("…

Python - Core Speed [duplicate]

This question already has answers here:Getting processor information in Python(12 answers)Closed 8 years ago.Im trying to find out where this value is stored in both windows and osx, in order to do som…