Hi guys i'm fairly new in python. what i'm trying to do is to move my old code into multiprocessing however i'm facing some errors that i hope anyone could help me out. My code is used to check a few thousand links given in a text form to check for certain tags. Once found it will output it to me. Due to the reason i have a few thousand links to check, speed is an issue and hence the need for me to move to multi processing.
Update: i'm having return errors of HTTP 503 errors. Am i sending too much request or am i missin gout something?
Multiprocessing code:
from mechanize import Browser
from bs4 import BeautifulSoup
import sys
import socket
from multiprocessing.dummy import Pool # This is a thread-based Pool
from multiprocessing import cpu_countbr = Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]no_stock = []def main(lines):done = Falsetries = 1while tries and not done:try:r = br.open(lines, timeout=15)r = r.read()soup = BeautifulSoup(r,'html.parser')done = True # exit the loopexcept socket.timeout:print('Failed socket retrying')tries -= 1 # to exit when tries == 0except Exception as e: print '%s: %s' % (e.__class__.__name__, e)print sys.exc_info()[0]tries -= 1 # to exit when tries == 0if not done:print('Failed for {}\n'.format(lines))table = soup.find_all('div', {'class' : "empty_result"})results = soup.find_all('strong', style = 'color: red;')if table or results:no_stock.append(lines)if __name__ == "__main__":r = br.open('http://www.randomweb.com/') #avoid redirectionfileName = "url.txt"pool = Pool(processes=2)with open(fileName, "r+") as f:lines = pool.map(main, f)with open('no_stock.txt','w') as f :f.write('No. of out of stock items : '+str(len(no_stock))+'\n'+'\n')for i in no_stock:f.write(i + '\n')
Traceback:
Traceback (most recent call last):File "test2.py", line 43, in <module>lines = pool.map(main, f)File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in mapreturn self.map_async(func, iterable, chunksize).get()File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in getraise self._value
UnboundLocalError: local variable 'soup' referenced before assignment
my txt file is something like this:-
http://www.randomweb.com/item.htm?uuid=44733096229
http://www.randomweb.com/item.htm?uuid=4473309622789
http://www.randomweb.com/item.htm?uuid=447330962291
....etc