I have the below webpage source:
</li><li class="cl-static-search-result" title="BELLO HONDA ACCORD "95 MIL MILLAS". REALMENTE COMO NUEVO"><a href="link1"><div class="title">BELLO HONDA ACCORD "95 MIL MILLAS". REALMENTE COMO NUEVO</div><div class="details"><div class="price">$4,600</div><div class="location">Miami</div></div></a></li><li class="cl-static-search-result" title="Honda Element"><a href=" link2 "><div class="title">Honda Element</div><div class="details"><div class="price">$4,950</div><div class="location">Coral springs</div></div></a></li><li class="cl-static-search-result" title="Mint Jeep"><a href=" link3 "><div class="title">Mint Jeep</div><div class="details"><div class="price">$8,500</div><div class="location">Pompano</div></div></a></li>
I need to extract the data as below:
| URL | TITLE | PRICE |
| ---- | ------------------- | ------ |
| link1 | BELLO HONDA ACCORD | $4,600 |
| link2 | Honda Element | $4,950 |
| link3 | Mint Jeep | $8,500 |
I am able to extract the URL names. When I attempt to get the title and price, it seems I am entering a loop that get the title/price for the full page after each URL link I get. Below is my code:
from urllib import request
from bs4 import BeautifulSoup
from lxml import etree
import csv
page_url = 'URLNAME'
rawpage = request.urlopen(page_url)soup = BeautifulSoup(rawpage, 'html5lib')links_list = []for link in soup.find_all('a'): try:url = link.get('href')for div in soup.find_all('div', attrs={'class':'title'}):title = div.textprint (title)links_list.append({'url': url})# if the row is missing anything...except AttributeError:#....skip it, dont blow up.pass# save it to csvwith open('links.csv', 'w', newline='') as csv_out:csv_writer = csv.writer(csv_out)# Creta the header rowscsv_writer.writerow(['url', 'title'])for row in links_list:csv_writer.writerow([str(row['url'])])