Question 1

I get the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" when scraping certain pages, and understand that adding something like hdr = {"User-Agent': 'Mozilla/5.0"} to the header is the solution for this.

However I can't make it work when the URL's I'm trying to scrape is in a separate source file. How/where can I add the User-Agent to the code below?

from bs4 import BeautifulSoup
import urllib.request as urllib2
import timelist_open = open("source-urls.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")i = 0
for url in line_in_list:soup = BeautifulSoup(urllib2.urlopen(url).read(), 'html.parser')name = soup.find(attrs={'class': "name"})description = soup.find(attrs={'class': "description"})for text in description:print(name.get_text(), ';', description.get_text())
#        time.sleep(5)i += 1

Question 2

You can achieve same using requests

import requests
hdrs = {'User-Agent': 'Mozilla / 5.0 (X11 Linux x86_64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 52.0.2743.116 Safari / 537.36'}    
for url in line_in_list:resp = requests.get(url, headers=hdrs)soup = BeautifulSoup(resp.content, 'html.parser')name = soup.find(attrs={'class': "name"})description = soup.find(attrs={'class': "description"})for text in description:print(name.get_text(), ';', description.get_text())
#        time.sleep(5)i += 1

Hope it helps!

urllib.error.HTTPError: HTTP Error 403: Forbidden

Related Q&A

Compare multiple file name with the prefix of name in same directory

how to make a unique data from strings

How to read complex data from TB size binary file, fast and keep the most accuracy?

How to get spans text without inner attributes text with selenium?

List of 2D arrays with different size into 3D array [duplicate]

How can I read data from database and show it in a PyQt table

Python: Cubic Spline Regression for a time series data

python CSV , find max and print the information

Error on python3 on windows subsystem for linux for fenics program

python regex: how to remove hex dec characters from string [duplicate]