I am an absolute beginner to Web Scraping using Python and know very little about programming in Python. I am just trying to extract the information of the lawyers in the Tennessee location. In the webpage, there are multiple links, within which there are further links to the categories of lawyers, and within those are the lawyers' details.
I have already extracted the links of the various cities into a list and have also extracted the various categories of lawyers available in each of the cities' links. The profile links have also been fetched and stored as a set. Now I am trying to fetch each lawyer's name, address, firm name and practice area and store it as an .xls file.
import requests
from bs4 import BeautifulSoup as bs
import pandas as pdfinal=[]
records=[]
with requests.Session() as s:res = s.get('https://attorneys.superlawyers.com/tennessee/', headers = {'User-agent': 'Super Bot 9000'})soup = bs(res.content, 'lxml')cities = [item['href'] for item in soup.select('#browse_view a')]for c in cities:r=s.get(c)s1=bs(r.content,'lxml')categories = [item['href'] for item in s1.select('.three_browse_columns:nth-of-type(2) a')]for c1 in categories:r1=s.get(c1)s2=bs(r1.content,'lxml')lawyers = [item['href'].split('*')[1] if '*' in item['href'] else item['href'] for item ins2.select('.indigo_text .directory_profile')]final.append(lawyers)
final_list={item for sublist in final for item in sublist}
for i in final_list:r2 = s.get(i)s3 = bs(r2.content, 'lxml')name = s3.find('h2').text.strip()add = s3.find("div").text.strip()f_name = s3.find("a").text.strip()p_area = s3.find('ul',{"class":"basic_profile aag_data_value"}).find('li').text.strip()records.append({'Names': name, 'Address': add, 'Firm Name': f_name,'Practice Area':p_area})
df = pd.DataFrame(records,columns=['Names','Address','Firm Name','Practice Areas'])
df=df.drop_duplicates()
df.to_excel(r'C:\Users\laptop\Desktop\lawyers.xls', sheet_name='MyData2', index = False, header=True)
I expected to get a .xls file, but nothing is returned as the execution is going on. It does not terminate until I force stop, and no .xls file is made.