I've written a script in python in combination with BeautifulSoup to extract the title of books which get populated upon providing some ISBN numbers in amazon search box. I'm providing those ISBN numbers from an excel file named amazon.xlsx
. When I try using my following script, It parse the titles accordingly and write back to excel file as intended.
The link where I put isbn numbers to populate the results.
import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbookwb = load_workbook('amazon.xlsx')
ws = wb['content']def get_info(num):params = {'url': 'search-alias=aps','field-keywords': num}res = requests.get("https://www.amazon.com/s/ref=nb_sb_noss?",params=params)soup = BeautifulSoup(res.text,"lxml")itemlink = soup.select_one("a.s-access-detail-page")if itemlink:get_data(itemlink['href'])def get_data(link):res = requests.get(link)soup = BeautifulSoup(res.text,"lxml")try:itmtitle = soup.select_one("#productTitle").get_text(strip=True)except AttributeError: itmtitle = "N\A"print(itmtitle)ws.cell(row=row, column=2).value = itmtitlewb.save("amazon.xlsx")if __name__ == '__main__':for row in range(2, ws.max_row + 1):if ws.cell(row=row,column=1).value==None:breakval = ws["A" + str(row)].valueget_info(val)
However, when I try to do the same using multiprocessing
I get the following error:
ws.cell(row=row, column=2).value = itmtitle
NameError: name 'row' is not defined
For multiprocessing
what I brought changes in my script is:
from multiprocessing import Poolif __name__ == '__main__':isbnlist = []for row in range(2, ws.max_row + 1):if ws.cell(row=row,column=1).value==None:breakval = ws["A" + str(row)].valueisbnlist.append(val)with Pool(10) as p:p.map(get_info,isbnlist)p.terminate()p.join()
Few of the ISBN I've tried with:
9781584806844
9780917360664
9780134715308
9781285858265
9780986615108
9780393646399
9780134612966
9781285857589
9781453385982
9780134683461
How Can I get rid of that error and get the desired results using multiprocessing
?