Multiprocessing in python/beautifulsoup issues

2024/9/21 3:23:11

Hi guys i'm fairly new in python. what i'm trying to do is to move my old code into multiprocessing however i'm facing some errors that i hope anyone could help me out. My code is used to check a few thousand links given in a text form to check for certain tags. Once found it will output it to me. Due to the reason i have a few thousand links to check, speed is an issue and hence the need for me to move to multi processing.

Update: i'm having return errors of HTTP 503 errors. Am i sending too much request or am i missin gout something?

Multiprocessing code:

from mechanize import Browser
from bs4 import BeautifulSoup
import sys
import socket
from multiprocessing.dummy import Pool  # This is a thread-based Pool
from multiprocessing import cpu_countbr = Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]no_stock = []def main(lines):done = Falsetries = 1while tries and not done:try:r = br.open(lines, timeout=15)r = r.read()soup = BeautifulSoup(r,'html.parser')done = True # exit the loopexcept socket.timeout:print('Failed socket retrying')tries -= 1 # to exit when tries == 0except Exception as e: print '%s: %s' % (e.__class__.__name__, e)print sys.exc_info()[0]tries -= 1 # to exit when tries == 0if not done:print('Failed for {}\n'.format(lines))table = soup.find_all('div', {'class' : "empty_result"})results = soup.find_all('strong', style = 'color: red;')if table or results:no_stock.append(lines)if __name__ == "__main__":r = br.open('http://www.randomweb.com/') #avoid redirectionfileName = "url.txt"pool = Pool(processes=2)with open(fileName, "r+") as f:lines = pool.map(main, f)with open('no_stock.txt','w') as f :f.write('No. of out of stock items : '+str(len(no_stock))+'\n'+'\n')for i in no_stock:f.write(i + '\n')

Traceback:

Traceback (most recent call last):File "test2.py", line 43, in <module>lines = pool.map(main, f)File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in mapreturn self.map_async(func, iterable, chunksize).get()File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in getraise self._value
UnboundLocalError: local variable 'soup' referenced before assignment

my txt file is something like this:-

http://www.randomweb.com/item.htm?uuid=44733096229
http://www.randomweb.com/item.htm?uuid=4473309622789
http://www.randomweb.com/item.htm?uuid=447330962291
....etc
Answer
from mechanize import Browser
from bs4 import BeautifulSoup
import sys
import socket
from multiprocessing.dummy import Pool  # This is a thread-based Pool
from multiprocessing import cpu_countbr = Browser()no_stock = []def main(line):done = Falsetries = 3while tries and not done:try:r = br.open(line, timeout=15)r = r.read()soup = BeautifulSoup(r,'html.parser')done = True # exit the loopexcept socket.timeout:print('Failed socket retrying')tries -= 1 # to exit when tries == 0except:print('Random fail retrying')print sys.exc_info()[0]tries -= 1 # to exit when tries == 0if not done:print('Failed for {}\n'.format(i))table = soup.find_all('div', {'class' : "empty_result"})results = soup.find_all('strong', style = 'color: red;')if table or results:no_stock.append(i)if __name__ == "__main__":fileName = "url.txt"pool = Pool(cpu_count() * 2)  # Creates a Pool with cpu_count * 2 threads.with open(fileName, "rb") as f:lines = pool.map(main, f)with open('no_stock.txt','w') as f :f.write('No. of out of stock items : '+str(len(no_stock))+'\n'+'\n')for i in no_stock:f.write(i + '\n')

pool.map takes two parameters, the fist is a function(in your code, is main), the other is an iterable, each item of iterable will be a parameter of the function(in your code, is each line of the file)

https://en.xdnf.cn/q/119989.html

Related Q&A

Fastest possible generation of permutation with defined element values in Python

Trying to generate permutations, could be used with generator or produced List of Lists (but maybe I need a lot of memory?) Looked on the Internet and SO, but couldnt find a version where I define the…

Obtain coordinates of a Polygon / Multi-polygon around a point in python [duplicate]

This question already has answers here:Draw a polygon around point in scattermapbox using python(2 answers)Closed 2 years ago.I am using plotlys scattermapbox to draw a polygon around a point object. I…

PyCuda Error in Execution

This is my pycuda code for rotation.I have installed the latest cuda drivers and I use a nvidia gpu with cuda support.I have also installed the cuda toolkit and pycuda drivers.Still I get this strange …

Python code to Download Specific JSON key value data through REST API calls

I am trying to write a code in python which download only specific key value in the Calls. So the solution might beDownloading the Raw data and later removing the unwanted through regex or 2)Applying s…

How to measure pairwise distances between two sets of points?

I have two datasets (csv files). Both of them contains latitudes-longitudes of two sets (220 and 4400) of points. Now I want to measure pairwise distances (miles) between these two sets of points (220 …

Interactively Re-color Bars in Matplotlib Bar Chart using Confidence Intervals

Trying to shade the bars in this chart based on the confidence that a selected y-value (represented by the red line) lies within a confidence interval. See recolorBars() method in the class example bel…

Unlock password protected Workbook using VBA or Python

I have a workbook name m.xlsx, but its password protected and Ive forgotten the password. How can I open it or un-protect it?The following code does not work:Unprotect workbook without password I need…

How do I make a variable detect if it is greater than or less than another one?

I am currently learning Python, and I decided to build a small "Guess the Number" type of game. I am using the random feature, and trying to make it so it will detect if the users input is eq…

Python Regular Expression from File

I want to extract lines following some sequence from a file. E.g. a file contains many lines and I want line in sequencejourney (a,b) from station south chennai to station punjab chandigarh journey (c,…

Changing the words keeping its meaning intact [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…