Web Scraping BeautifulSoup - Next Page parsing

2024/11/10 12:58:29

I'm just learning web scraping & want to output the result of this website to a csv file https://www.avbuyer.com/aircraft/private-jets

but am struggling with parsing the next pages here is my code (with help of Amen Aziz) which only gives me the 1st page
I'm using Chrome so not sure if it makes any difference I'm running Python 3.8.12
Thank you in advance

import requests
from bs4 import BeautifulSoup
import pandas as pd
headers= {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.avbuyer.com/aircraft/private-jets')
soup = BeautifulSoup(response.content, 'html.parser')
postings = soup.find_all('div', class_ = 'listing-item premium')
temp=[]
for post in postings:link = post.find('a', class_ = 'more-info').get('href')link_full = 'https://www.avbuyer.com'+ linkplane = post.find('h2', class_ = 'item-title').textprice = post.find('div', class_ = 'price').textlocation = post.find('div', class_ = 'list-item-location').textdesc = post.find('div', class_ = 'list-item-para').texttry:tag = post.find('div', class_ = 'list-viewing-date').textexcept:tag = 'N/A'updated = post.find('div', class_ = 'list-update').textt=post.find_all('div',class_='list-other-dtl')for i in t:data=[tup.text for tup in i.find_all('li')]years=data[0]s=data[1]total_time=data[2]temp.append([plane,price,location,years,s,total_time,desc,tag,updated,link_full])df=pd.DataFrame(temp,columns=["plane","price","location","Year","S/N","Totaltime","Description","Tag","Last Updated","link"])next_page = soup.find('a', {'rel':'next'}).get('href')
next_page_full = 'https://www.avbuyer.com'+next_page
next_page_fullurl = next_page_full
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml') df.to_csv('/Users/xxx/avbuyer.csv')
Answer

Try this: If you want cvs file then you finish the line print(df) and use df.to_csv("prod.csv") I have written in code to get csv file

import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'User-Agent': 'Mozilla/5.0'}
temp=[]
for page in range(1, 20):response = requests.get("https://www.avbuyer.com/aircraft/private-jets/page-{page}".format(page=page),headers=headers,)soup = BeautifulSoup(response.content, 'html.parser')postings = soup.find_all('div', class_='grid-x list-content')for post in postings:plane = post.find('h2', class_='item-title').texttry:price = post.find('div', class_='price').textexcept:price=" "location = post.find('div', class_='list-item-location').textt=post.find_all('div',class_='list-other-dtl')for i in t:data=[tup.text for tup in i.find_all('li')]years=data[0]s=data[1]total_time=data[2]temp.append([plane,price,location,years,s,total_time])df=pd.DataFrame(temp,columns=["plane","price","location","Years","S/N","Totaltime"])
print(df)

output:

                      plane         price  ...             S/N         Totaltime
0            Gulfstream G280     Make offer  ...        S/N 2007   Total Time 2528
1    Dassault Falcon 2000LXS     Make offer  ...         S/N 377     Total Time 33
2       Cirrus Vision SF50 G1  Please call   ...        S/N 0080    Total Time 615
3              Gulfstream IV     Make offer  ...        S/N 1148   Total Time 6425
4            Gulfstream G280     Make offer  ...        S/N 2072   Total Time 1918
..                        ...           ...  ...             ...               ...
342       Embraer Phenom 100       Now Sold  ...    S/N 50000035   Total Time 3417
343          Gulfstream G200       Now Sold  ...         S/N 152   Total Time 7209
344     Cessna Citation XLS+       Now Sold  ...           S/N -      Total Time -
345    Cessna Citation Ultra       Now Sold  ...    S/N 560-0393  Total Time 12947
346    Cessna Citation Excel       Now Sold  ...  S/N 560XL-5253   Total Time 4850
https://en.xdnf.cn/q/119781.html

Related Q&A

convert sum value to percentage by userid django

Im trying to convert the total sum to a percentage by userid but an error pops up when I try to run the following program. The error is: name mark is not definedBelow is my code for views.pydef attStud…

ValueError: Too many values to unpack

Task is to find,sort,and remove the student with type: "homework" and with the lowest score using MongoDB. I also tried to use toArray() function,but it gave an error. Now I try to move on in…

Pandas - Create dynamic column(s) from a single columns values

I have JSON data which I am planning after converting it to desired dataframe, will concat with another dataframe. Participant**row 1** [{roles: [{type: director}, {type: founder}, {type: owner}, {type…

How to automatically remove certain preprocessors directives and comments from a C header-file?

Whats a good way to remove all text from a file which lies between /* */ and #if 0 and corresponding #endif? I want to strip these parts from C headers. This is the code I have so far:For line in file…

Get all pairs from elements in sublists

I have a list of sublists. I need all possible pairs between the elements in the sublists. For example, for a list like this: a=[[1,2,3],[4,5],[6]]The result should be: result=[[1,4], [1,5], [1,6], [2,…

Extracting variables from Javascript inside HTML

I need all the lines which contains the text .mp4. The Html file has no tag!My code:import urllib.request import demjson url = (https://myurl) content = urllib.request.urlopen(url).read()<script typ…

Pygame, self is not defined [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

Python 3- assigns grades [duplicate]

This question already has answers here:Python 3- Assign grade(2 answers)Closed 8 years ago.• Define a function to prompt the user to enter valid scores until they enter a sentinel value -999. Have …

how to read video data from memory use pyqt5

i have an encrypted video file, i want to decrypt this file into memory and then use this data play video. but qt mediaplayer class is to pass a file name in, i need to have any good way?this is my co…

Pandas apply custom function to DF

I would like to create a brand new data frame by replacing values of a DF using a custom function. I keep getting the following error "ValueError: The truth value of a Series is ambiguous. Use a.e…