Question 1

I have the below webpage source:

</li><li class="cl-static-search-result" title="BELLO HONDA ACCORD &quot;95 MIL MILLAS&quot;. REALMENTE COMO NUEVO"><a href="link1"><div class="title">BELLO HONDA ACCORD &quot;95 MIL MILLAS&quot;. REALMENTE COMO NUEVO</div><div class="details"><div class="price">$4,600</div><div class="location">Miami</div></div></a></li><li class="cl-static-search-result" title="Honda Element"><a href=" link2 "><div class="title">Honda Element</div><div class="details"><div class="price">$4,950</div><div class="location">Coral springs</div></div></a></li><li class="cl-static-search-result" title="Mint Jeep"><a href=" link3 "><div class="title">Mint Jeep</div><div class="details"><div class="price">$8,500</div><div class="location">Pompano</div></div></a></li>

I need to extract the data as below:

| URL  | TITLE               | PRICE  |
| ---- | ------------------- | ------ |
| link1 | BELLO HONDA ACCORD | $4,600 |
| link2 | Honda Element      | $4,950 |
| link3 | Mint Jeep          | $8,500 |

I am able to extract the URL names. When I attempt to get the title and price, it seems I am entering a loop that get the title/price for the full page after each URL link I get. Below is my code:

from urllib import request 
from bs4 import BeautifulSoup
from lxml import etree
import csv
page_url = 'URLNAME'
rawpage = request.urlopen(page_url)soup = BeautifulSoup(rawpage, 'html5lib')links_list = []for link in soup.find_all('a'):              try:url = link.get('href')for div in soup.find_all('div', attrs={'class':'title'}):title = div.textprint (title)links_list.append({'url': url})# if the row is missing anything...except AttributeError:#....skip it, dont blow up.pass# save it to csvwith open('links.csv', 'w', newline='') as csv_out:csv_writer = csv.writer(csv_out)# Creta the header rowscsv_writer.writerow(['url', 'title'])for row in links_list:csv_writer.writerow([str(row['url'])])

Question 2

Try to change your strategy selecting / iterating elements and may use css selectors:

...
data = []
soup = BeautifulSoup(html)
for e in soup.select('li[title]'):data.append({'link':e.a.get('href'),'title':e.get('title'),'price': e.select_one('.price').get_text()})
data

Process the list of dicts to write your file or create a dataframe, ...

Example

from bs4 import BeautifulSoup
html = '''
<li class="cl-static-search-result" title="BELLO HONDA ACCORD &quot;95 MIL MILLAS&quot;. REALMENTE COMO NUEVO"><a href="link1"><div class="title">BELLO HONDA ACCORD &quot;95 MIL MILLAS&quot;. REALMENTE COMO NUEVO</div><div class="details"><div class="price">$4,600</div><div class="location">Miami</div></div></a></li><li class="cl-static-search-result" title="Honda Element"><a href=" link2 "><div class="title">Honda Element</div><div class="details"><div class="price">$4,950</div><div class="location">Coral springs</div></div></a></li><li class="cl-static-search-result" title="Mint Jeep"><a href=" link3 "><div class="title">Mint Jeep</div><div class="details"><div class="price">$8,500</div><div class="location">Pompano</div></div></a></li>
'''
data = []
soup = BeautifulSoup(html)
for e in soup.select('li[title]'):data.append({'link':e.a.get('href'),'title':e.get('title'),'price': e.select_one('.price').get_text()})
data

How to retrieve nested data with BeautifulSoup?

Example

Related Q&A

applying onehotencoder on numpy array

How to delete temp folder data using python script [closed]

Save a list of objects on exit of pygame game [closed]

Trying to make loop for a function that stops after the result is lower than a certain value

python url extract from html

Regex match each character at least once [closed]

How to cluster with K-means, when number of clusters and their sizes are known [closed]

Converting German characters (like , etc) from Mac Roman to UTF (or similar)?

Caesar cipher without knowing the Key

how to convert u\uf04a to unicode in python [duplicate]