Scrapy process.crawl() to export data to json

2024/10/10 7:21:56

This might be a subquestion of Passing arguments to process.crawl in Scrapy python but the author marked the answer (that doesn't answer the subquestion i'm asking myself) as a satisfying one.

Here's my problem : I cannot use scrapy crawl mySpider -a start_urls(myUrl) -o myData.json
Instead i want/need to use crawlerProcess.crawl(spider) I have already figured out several way to pass the arguments (and anyway it is answered in the question I linked) but i can't grasp how i am supposed to tell it to dump the data into myData.json... the -o myData.json part
Anyone got a suggestion ? Or am I just not understanding how it is supposed to work..?

Here is the code :

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()spider = challenges(start_urls=["http://www.myUrl.html"])
crawlerProcess.crawl(spider)
#For now i am just trying to get that bit of code to work but obviously it will become a loop later.dispatcher.connect(handleSpiderIdle, signals.spider_idle)log.start()
print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."
Answer

You need to specify it on the settings:

process = CrawlerProcess({'FEED_URI': 'file:///tmp/export.json',
})process.crawl(MySpider)
process.start()
https://en.xdnf.cn/q/69921.html

Related Q&A

Embedding Python in C: Error in linking - undefined reference to PyString_AsString

I am trying to embed a python program inside a C program. My OS is Ubuntu 14.04I try to embed python 2.7 and python 3.4 interpreter in the same C code base (as separate applications). The compilation a…

How can I add properties to a class using a decorator that takes a list of names as argument?

I would like to add many dummy-properties to a class via a decorator, like this:def addAttrs(attr_names):def deco(cls):for attr_name in attr_names:def getAttr(self):return getattr(self, "_" +…

How to reshape only last dimensions in numpy?

Suppose I have A of shape (...,96) and want to reshape it into (...,32,3) keeping both lengths and number of preceding dimensions, if ever, intact.How to do this?If I writenp.reshape(A, (-1, 32, 2))it…

Relative import of submodule

In Python, how do I perform the equivalent of the followingimport http.clientbut using a relative import:from . import http.client import .http.clientFor a package http in the current package? I want …

Python regular expression to replace everything but specific words

I am trying to do the following with a regular expression:import re x = re.compile([^(going)|^(you)]) # words to replace s = I am going home now, thank you. # string to modify print re.sub(x, _, s)T…

How do I raise a window that is minimized or covered with PyGObject?

Id been using the answer provided in the PyGTK FAQ, but that doesnt seem to work with PyGObject. For your convenience, here is a test case that works with PyGTK, and then a translated version that does…

How to bind multiple widgets with one bind in Tkinter?

I am wondering how to bind multiple widgets with one "bind".For expample: I have three buttons and I want to change their color after hovering.from Tkinter import *def SetColor(event):event.w…

Iterate a large .xz file line by line in python

I have a large .xz file (few gigabytes). Its full of plain text. I want to process the text to create custom dataset. I want to read it line by line because it is too big. Anyone have an idea how to do…

Detect multiple circles in an image

I am trying to detect the count of water pipes in this picture. For this, I am trying to use OpenCV and Python-based detection. The results, I am getting is a little confusing to me because the spread …

Need guidance with FilteredSelectMultiple widget

I am sorry if it question might turn to be little broad, but since I am just learning django (and I am just hobbyist developer) I need some guidance which, I hope, will help someone like me in the futu…