How do I fix scrapy Unsupported URL scheme error?

2024/9/20 12:17:43

I collect url from command python and then insert it into start_urls

from flask import Flask, jsonify, request
import scrapy
import subprocessclass ClassSpider(scrapy.Spider):name        = 'mySpider'#start_urls = []#pages      = 0news        = []def __init__(self, url, nbrPage):self.pages      = nbrPageself.start_urls = []self.start_urlsappend(url)def parse(self):...def run(self):subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.start_urls}', '-a', f'nbrPage={self.pages}'])return self.newsapp = Flask(__name__)
data = []@app.route('/', methods=['POST'])
def getNews():mySpiderClass = ClassSpider(request.json['url'], 2)return jsonify({'data': mySpider.run()})if __name__ == "__main__":app.run(debug=True)

I got this error: raise not supported("unsupported url scheme %s: %s" % scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme

When I put a print('my urls List: ' + str(self.start_urls)), it prints a list of url like --> my urls List: ['www.googole.com']

Any help plz

Answer

I guess this happens because you first append url to self.start_urls and then you call ClassSpiders run method with your list self.start_urls which in turn appends the list to a list and you end up with a nested list instead of a list of strings.
To avoid this you should maybe change your __init__ method like this:

    def __init__(self, url, nbrPage):self.pages      = nbrPageself.url        = urlself.start_urls = []self.start_urls.append(url)

And then pass self.url instead of self.start_urls in run:

    def run(self):subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.url}', '-a', f'nbrPage={self.pages}'])return self.news
https://en.xdnf.cn/q/119716.html

Related Q&A

comparing two timeseries dataframes based on some conditions in pandas

I have two timeseries dataframes df1 and df2: df1 = pd.DataFrame({date_1:[10/11/2017 0:00,10/11/2017 03:00,10/11/2017 06:00,10/11/2017 09:00],value_1:[5000,1500,np.nan,2000]})df1[date_1] = pd.to_dateti…

Game of Chance in Python 3.x?

I have this problem in my python code which is a coinflip game, the problem is that when It asks, "Heads or Tails?" and I just say 1 or Heads(same for 2 and Tails) without quotation marks an…

Count occurence of a word by ID in python

Following is the content of a file,My question is how to count the number of occurences for the word "optimus" for different IDs ID67 DATEUID Thank you for choosing Optimus prime. Please w…

ModuleNotFoundError: No module named plyer in Python

I am trying to write a program notify.py (location: desktop) that uses plyer library to get a notification on windows 10. I used pip install plyer and am using vs code to run the program but I get an e…

Floating point to 16 bit Twos Complement Binary, Python

so I think questions like this have been asked before but Im having quite a bit of trouble getting this implemented. Im dealing with CSV files that contain floating points between -1 and 1. All of thes…

Flag the first non zero column value with 1 and rest 0 having multiple columns

Please assist with the belowimport pandas as pd df = pd.DataFrame({Grp: [1,1,1,1,2,2,2,2,3,3,3,4,4,4], Org1: [x,x,y,y,z,y,z,z,x,y,y,z,x,x], Org2: [a,a,b,b,c,b,c,c,a,b,b,c,a,a], Value: [0,0,3,1,0,1,0,5,…

How to split up data from a column in a csv file into two separate output csv files?

I have a .csv file, e.g.:ID NAME CATEGORIES 1, x, AB 2, xx, AA 3, xxx, BAHow would I get this to form two output .csv files based on the category e.g.:File 1:ID NAME CATEGORY 1, x, A 2, xx, A 3, …

Discord.py spellcheck commands

Recently, I looked up Stack Overflow and found this code which can check for potential typos: from difflib import SequenceMatcher SequenceMatcher(None, "help", "hepl").ratio() # Ret…

Django Model Form doesnt seem to validate the BooleanField

In my model the validation is not validating for the boolean field, only one time product_field need to be checked , if two time checked raise validation error.product_field = models.BooleanField(defau…

For loop only shows the first object

I have a code that loops through a list of mails, but it is only showing the first result, even though there are also other matches. The other results require me to loop over the mails again only to re…