HTML Link parsing using BeautifulSoup

2024/10/14 23:18:14

here is my Python code which I'm using to extract the Specific HTML from the Page links I'm sending as parameter. I'm using BeautifulSoup. This code works fine for sometimes and sometimes it is getting stuck!

import urllib
from bs4 import BeautifulSouprawHtml = ''
url = r'http://iasexamportal.com/civilservices/tag/voice-notes?page='
for i in range(1, 49):  #iterate url and capture contentsock = urllib.urlopen(url+ str(i))html = sock.read()  sock.close()rawHtml += htmlprint i

Here I'm printing the loop variable to find out where it is getting stuck. It shows me that it's getting stuck randomly at any of the loop sequence.

soup = BeautifulSoup(rawHtml, 'html.parser')
t=''
for link in soup.find_all('a'):t += str(link.get('href')) + "</br>"#t += str(link) + "</br>"
f = open("Link.txt", 'w+')
f.write(t)
f.close()

what could be the possible issue. Is it the problem with the socket configuration or some other issue.

This is the error I got. I checked these links - python-gaierror-errno-11004,ioerror-errno-socket-error-errno-11004-getaddrinfo-failed for the solution. But I didn't find it much helpful.

 d:\python>python ext.py
Traceback (most recent call last):File "ext.py", line 8, in <module>sock = urllib.urlopen(url+ str(i))File "d:\python\lib\urllib.py", line 87, in urlopenreturn opener.open(url)File "d:\python\lib\urllib.py", line 213, in openreturn getattr(self, name)(url)File "d:\python\lib\urllib.py", line 350, in open_httph.endheaders(data)File "d:\python\lib\httplib.py", line 1049, in endheadersself._send_output(message_body)File "d:\python\lib\httplib.py", line 893, in _send_outputself.send(msg)File "d:\python\lib\httplib.py", line 855, in sendself.connect()File "d:\python\lib\httplib.py", line 832, in connectself.timeout, self.source_address)File "d:\python\lib\socket.py", line 557, in create_connectionfor res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11004] getaddrinfo failed

It's running perfectly fine when I'm running it on my personal laptop. But It's giving error when I'm running it on Office Desktop. Also, My version of Python is 2.7. Hope these information will help.

Answer

Finally, guys.... It worked! Same script worked when I checked on other PC's too. So probably the problem was because of the firewall settings or proxy settings of my office desktop. which was blocking this website.

https://en.xdnf.cn/q/117901.html

Related Q&A

XML format change using XSL file in a Python code

Have written a Python code to transform a XML file to a particular format using XSL stylesheet. Python code below:#!/usr/bin/env python # -*- coding:utf-8 -*- from lxml import etree def transform(xmlP…

How to extract a specific value from a dictionary in python

I want to extract distance from google distance matrix API in Python. The objet it returned is a python dictionary.{destination_addresses: [Mumbai, Maharashtra, India ],origin_addresses: [Powai, Mumbai…

label on top of image in python

I am trying to display text on top of an image. Right now the text is below the image or if I put a row=0 in the grid disappears. I am assuming it is behind the image. I cant seem to get it to work. My…

plot multiple graphs from multiple files gnuplot

I have a set of files named like this:qd-dPZ-z1-1nn.dat qd-dPZ-z2-1nn.dat qd-dPZ-z4-1nn.dat qd-dPZ-z8-1nn.dat qd-dPZ-z16-1nn.dat qd-dPZ-z32-1nn.dat qd-dPZ-z1-2nn.dat qd-dPZ-z2-2nn.dat qd-dPZ-z4…

Python writing to CSV... TypeError: coercing to Unicode: need string or buffer, file found

outputList is a list of lists. [ [a,b,c], [d,e,f], [g,h,i] ] and I want to output it to a csv file with each list as a separate row. Im getting this error TypeError: coercing to Unicode: need string or…

Preserve Signature in Decorator python 2

I am writing a decorator which will catch TypeError for incorrect number of arguments in a function call and will print a customised message. The code is here:import inspectdef inspect_signature(f):def…

Gimp: start script without image

Well, Im trying to write a python plug-in for Gimp, but it wont start without first loading an image... What can I do about that?

Pywinauto: how the `findbestmatch` module works?

Im trying to understand how the findbestmatch module works. Here is an example.from pywinauto.application import Application from pywinauto.findbestmatch import find_best_match ditto=Application().conn…

How to get the surface from a rect/line

I am trying to find the point where a line collides with a brick in the arkanoid that i am making. The most logical way i found is getting the mask from the line and use collidemask as it returns the p…

Python readin .txt and put in an array with numpy

i want to create an array with numpy. The base is a .txt file which is given in the following form:i tried it with loadtxt:data = np.loadtxt("myfile.txt",delimiter=\n,skiprows = 1)The first r…