How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3

2024/9/20 0:49:12

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.

Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup).

import urllib.requesturl = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
infile = urllib.request.urlopen(url) # Open the URL
data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1print(data) # Print the data to the screen

However every now and then this is the error I am shown:

Traceback (most recent call last):File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in <module>webpage = urlopen(req).read().decode('ISO-8859-1')File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopenreturn opener.open(url, data, timeout)File "/usr/lib/python3.4/urllib/request.py", line 469, in openresponse = meth(req, response)File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response'http', request, response, code, msg, hdrs)File "/usr/lib/python3.4/urllib/request.py", line 507, in errorreturn self._call_chain(*args)File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chainresult = func(*args)File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_defaultraise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: ForbiddenProcess exited with code: 1

Link to a list of all the modules that are okay: https://docs.python.org/3.4/py-modindex.html

Many thanks in advance.

Answer

This is probably due to mod_security. You need to spoof by opening the URL as a browser, not as python urllib.

Here, I corrected your code:

import urllib.requesturl = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"# Open the URL as Browser, not as python urllib
page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'}) 
infile=urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1print(data) # Print the data to the screen

Next, you can use BeautifulSoup to scrape the HTML.

https://en.xdnf.cn/q/72395.html

Related Q&A

Installing lxml in virtualenv for windows

Ive recently started using virtualenv, and would like to install lxml in this isolated environment.Normally I would use the windows binary installer, but I want to use lxml in this virtualenv (not glob…

Saving a model in Django gives me Warning: Field id doesnt have a default value

I have a very basic model in Django:class Case(models.Model):name = models.CharField(max_length=255)created_at = models.DateTimeField(default=datetime.now)updated_at = models.DateTimeField(default=date…

Authorization architecture in microservice cluster

I have a project with microservice architecture (on Docker and Kubernetes), and 2 main apps are written in Python using AIOHTTP and Django (also there are and Ingress proxy, static files server, a coup…

fastest way to load images in python for processing

I want to load more than 10000 images in my 8gb ram in the form of numpy arrays.So far I have tried cv2.imread,keras.preprocessing.image.load_image,pil,imageio,scipy.I want to do it the fastest way pos…

How to access server response when Python requests library encounters the retry limit

I am using the Python requests library to implement retry logic. Here is a simple script I made to reproduce the problem that I am having. In the case where we run out of retries, I would like to be ab…

Matplotlib patch with holes

The following code works. The problem is I dont know exactly why it works. The code draws a circle patch (using PathPatch) with a triangle cutout from the centre. My guess is that the inner triangle is…

Convert sha256 digest to UUID in python

Given a sha256 hash of a str in python: import hashlibhash = hashlib.sha256(foobar.encode(utf-8))How can the hash be converted to a UUID? Note: there will obviously be a many-to-one mapping of hexdige…

Drag and Drop QLabels with PyQt5

Im trying to drag and drop a Qlabel on another Qlabel with PyQt5:from PyQt5.QtWidgets import QApplication, QWidget, QToolTip, QPushButton, QMessageBox, QHBoxLayout, QVBoxLayout, QGridLayout,QFrame, QCo…

replace block within {{ super() }}

I have a base template which includes a block for the default <head> content. Within the head block, theres a block for the <title>.For example, in the base file I would have:<head>{%…

Change Timezone for Date object Python

Hello I am using Pythonanywhere and when I call from datetime import *print date.today().dayIt is printing a different day than the day it is where I live (Austin, Texas). I figured it is because there…