Question 1

I managed to scrape a list of urls from a CSV file, but I got a problem, the scraping stops when it hits a broken link. Also it prints a lot of None lines, is it possible to get rid of them ?

Would appreciate some help here. Thank you in advance !

Here is the code :

#!/usr/bin/python
# -*- coding: utf-8 -*-from bs4 import BeautifulSoup #required to parse html
import requests #required to make request#read file
with open('urls.csv','r') as f:csv_raw_cont=f.read()#split by line
split_csv=csv_raw_cont.split('\n')#specify separator
separator=";"#iterate over each line
for each in split_csv:#specify the row indexurl_row_index=0 #in our csv example file the url is the first row so we set 0#get the urlurl = each.split(separator)[url_row_index] #fetch content from serverhtml = requests.get(url).content#soup fetched contentsoup = BeautifulSoup(html,'lxml')tags = soup.find("div", {"class": "productsPicture"}).findAll("a")for tag in tags:print(tag.get('href'))

And the result with the error looks like this :

https://www.tennis-point.com/asics-gel-resolution-7-all-court-shoe-men-white-silver-02013802720000.html
None
https://www.tennis-point.com/cep-ultralight-run-sports-socks-men-black-light-green-12143000063000.html
None
https://www.tennis-point.com/asics-gel-solution-speed-3-clay-court-shoe-men-white-grey-02013802634000.html
None
https://www.tennis-point.com/asics-gel-solution-speed-3-all-court-shoe-men-white-silver-02013802723000.html
None
https://www.tennis-point.com/asics-gel-challenger-9-indoor-carpet-shoe-men-white-grey-02012401735000.html
None
https://www.tennis-point.com/asics-gel-court-speed-clay-court-shoe-men-dark-blue-yellow-02014202833000.html
None
https://www.tennis-point.com/asics-gel-court-speed-all-court-shoe-men-white-silver-02014202832000.html
None
Traceback (most recent call last):
File "/Users/imaging-adrian/Desktop/Python Scripts/close_to_work.py", line 33, in <module>
tags = soup.find("div", {"class": "productsPicture"}).findAll("a")
AttributeError: 'NoneType' object has no attribute 'findAll'
[Finished in 3.7s with exit code 1]
[shell_cmd: python -u "/Users/imaging-adrian/Desktop/Python 
Scripts/close_to_work.py"]
[dir: /Users/imaging-adrian/Desktop/Python Scripts]
[path: /Users/imaging-adrian/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/munki]

The links inside my CSV files look like this :

https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E701Y-0193;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E601N-4907;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E601N-0193;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E600N-0193;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E326Y-0174;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E801N-4589;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E800N-0193;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E800N-9093;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E800N-4589;
https://www.tennis-point.com/index.php?stoken=737F2976&lang=1&cl=search&searchparam=E804N-9095;

Question 2

Here is working version,

from bs4 import BeautifulSoup
import requests
import csvwith open('urls.csv', 'r') as csvFile, open('results.csv', 'w', newline='') as results:reader = csv.reader(csvFile, delimiter=';')writer = csv.writer(results)for row in reader:# get the urlurl = row[0]# fetch content from serverhtml = requests.get(url).content# soup fetched contentsoup = BeautifulSoup(html, 'html.parser')divTag = soup.find("div", {"class": "productsPicture"})if divTag:tags = divTag.findAll("a")else:continuefor tag in tags:res = tag.get('href')if res != None:writer.writerow([res])

Skipp the error while scraping a list of urls form a csv

Related Q&A

Getting the TypeError - int object is not callable [closed]

Reordering columns in CSV

Variable not defined in while loop in python?

Hours and time converting to a certain format [closed]

Python socket server: listening to multiple clients [closed]

I have a problem with encoding with russian language for my python script [duplicate]

how do you style data frame in Pandas

Vacation price program Python [closed]

Why did push of a Flask app to Heroku failed?

How to navigate through HTMl pages that have paging for their content using Python? [closed]