urllib.error.HTTPError: HTTP Error 403: Forbidden

2024/10/5 19:49:59

I get the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" when scraping certain pages, and understand that adding something like hdr = {"User-Agent': 'Mozilla/5.0"} to the header is the solution for this.

However I can't make it work when the URL's I'm trying to scrape is in a separate source file. How/where can I add the User-Agent to the code below?

from bs4 import BeautifulSoup
import urllib.request as urllib2
import timelist_open = open("source-urls.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")i = 0
for url in line_in_list:soup = BeautifulSoup(urllib2.urlopen(url).read(), 'html.parser')name = soup.find(attrs={'class': "name"})description = soup.find(attrs={'class': "description"})for text in description:print(name.get_text(), ';', description.get_text())
#        time.sleep(5)i += 1
Answer

You can achieve same using requests

import requests
hdrs = {'User-Agent': 'Mozilla / 5.0 (X11 Linux x86_64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 52.0.2743.116 Safari / 537.36'}    
for url in line_in_list:resp = requests.get(url, headers=hdrs)soup = BeautifulSoup(resp.content, 'html.parser')name = soup.find(attrs={'class': "name"})description = soup.find(attrs={'class': "description"})for text in description:print(name.get_text(), ';', description.get_text())
#        time.sleep(5)i += 1

Hope it helps!

https://en.xdnf.cn/q/119631.html

Related Q&A

Compare multiple file name with the prefix of name in same directory

I have multiple .png and .json file in same directory . And I want to check where the files available in the directory are of same name or not like a.png & a.json, b.png & b.json

how to make a unique data from strings

I have a data like this . the strings are separated by comma."India1,India2,myIndia " "Where,Here,Here " "Here,Where,India,uyete" "AFD,TTT"What I am trying…

How to read complex data from TB size binary file, fast and keep the most accuracy?

Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below: file=open(filename,rb) bytes=file.read(8) print(bytes) b\x14\x00\x80?\xb5\x0c\xf81I tried np.fromfile np.fromfile(np…

How to get spans text without inner attributes text with selenium?

<span class="cname"><em class="multiple">2017</em> Ford </span> <span class="cname">Toyota </span>I want to get only "FORD" …

List of 2D arrays with different size into 3D array [duplicate]

This question already has answers here:How do you create a (sometimes) ragged array of arrays in Numpy?(2 answers)Closed last year.I have a program that generating 2D arrays with different number of r…

How can I read data from database and show it in a PyQt table

I am trying to load data from database that I added to the database through this code PyQt integration with Sqlalchemy .I want the data from the database to be displayed into a table.I have tried this …

Python: Cubic Spline Regression for a time series data

I have the data as shown below. I want to find a CUBIC SPLINE curve that fits the entire data set (link to sample data). Things Ive tried so far:Ive gone through scipys Cubic Spline Functions, but all …

python CSV , find max and print the information

My aim is to find the max of the individual column and print out the information. But there is problem when I print some of the information. For example CSIT135, nothing was printed out. CSIT121 only p…

Error on python3 on windows subsystem for linux for fenics program

Im just starting to use fenics in python3 on windows subsystem ubuntu, and when I open the first titurial file I got this error. Solving linear variational problem. Traceback (most recent call last): …

python regex: how to remove hex dec characters from string [duplicate]

This question already has answers here:What does a leading `\x` mean in a Python string `\xaa`(2 answers)Closed 8 years ago.text="\xe2\x80\x94" print re.sub(r(\\(?<=\\)x[a-z0-9]{2})+,&quo…