Question 1

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.

Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup).

import urllib.requesturl = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
infile = urllib.request.urlopen(url) # Open the URL
data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1print(data) # Print the data to the screen

However every now and then this is the error I am shown:

Traceback (most recent call last):File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in <module>webpage = urlopen(req).read().decode('ISO-8859-1')File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopenreturn opener.open(url, data, timeout)File "/usr/lib/python3.4/urllib/request.py", line 469, in openresponse = meth(req, response)File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response'http', request, response, code, msg, hdrs)File "/usr/lib/python3.4/urllib/request.py", line 507, in errorreturn self._call_chain(*args)File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chainresult = func(*args)File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_defaultraise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: ForbiddenProcess exited with code: 1

Link to a list of all the modules that are okay: https://docs.python.org/3.4/py-modindex.html

Many thanks in advance.

Question 2

This is probably due to mod_security. You need to spoof by opening the URL as a browser, not as python urllib.

Here, I corrected your code:

import urllib.requesturl = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"# Open the URL as Browser, not as python urllib
page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'}) 
infile=urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1print(data) # Print the data to the screen

Next, you can use BeautifulSoup to scrape the HTML.

How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3

Related Q&A

Installing lxml in virtualenv for windows

Saving a model in Django gives me Warning: Field id doesnt have a default value

Authorization architecture in microservice cluster

fastest way to load images in python for processing

How to access server response when Python requests library encounters the retry limit

Matplotlib patch with holes

Convert sha256 digest to UUID in python

Drag and Drop QLabels with PyQt5

replace block within {{ super() }}

Change Timezone for Date object Python