How to create DataFrame with columns based on scraped data?

2024/10/7 10:18:47
import requests, re
from bs4 import BeautifulSoupdata = []soup = BeautifulSoup(requests.get('https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AuS4sJ4GwAIB0gIkYWJlYmZiMWItNWJjMi00M2Y2LTk3MGUtMzI2ZGZmMmIyNzMz2AIF4AIB&aid=304142&dest_id=-2092174&dest_type=city&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&nflt=ht_id%3D204&rows=15',headers={'user-agent':'some agent'}).text)num_results = int(re.search(r'\d+',soup.select_one('div:has(+[data-testid="pagination"])').text).group(0))for i in range(0,int(num_results/25)):soup = BeautifulSoup(requests.get(f'https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AuS4sJ4GwAIB0gIkYWJlYmZiMWItNWJjMi00M2Y2LTk3MGUtMzI2ZGZmMmIyNzMz2AIF4AIB&aid=304142&dest_id=-2092174&dest_type=city&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&nflt=ht_id%3D204&rows=15&offset={int(i*25)}',headers={'user-agent':'some agent'}).text)data.extend([e.select_one('[data-testid="title"]').text for e in soup.select('[data-testid="property-card"]')])data.extend([e.select_one('[class="d8eab2cf7f c90c0a70d3 db63693c62"]') for e in soup.select('[data-testid="property-card"]')])data

enter image description here

I am getting name and reviews for all pages in a single line, i want to get this result in separate columns for names and reviews.

I want to get my result like this:

enter image description here

Answer

Actually I couldn't understand your question, what do yo want. If you could show a sample dataframe you want it would be great. But generally you can do it like that. For example in this data latitude longitude is in same column and you can separate them to two columns with split function. Don't forget to add headers.

import requests
from bs4 import BeautifulSoup as bs
from datetime import datetimebase_url = 'https://www.booking.com'
urlss = 'https://www.booking.com/searchresults.html?req_children=0&label=gen173nr-1FCAEoggI46AdIM1gEaGyIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AuS4sJ4GwAIB0gIkYWJlYmZiMWItNWJjMi00M2Y2LTk3MGUtMzI2ZGZmMmIyNzMz2AIF4AIB&group_children=0&dest_type=city&rows=15&aid=304142&dest_id=-2092174&nflt=ht_id%3D204&req_adults=2&no_rooms=1&group_adults=2'data = []
def pars(url):r = requests.get(url)soup = bs(r.text, 'html.parser')foor = {}try:foor['description'] = soup.find('div', id = 'property_description_content').textfoor['Title'] = soup.find('h2', class_  = 'd2fee87262 pp-header__title').textx = soup.find_all('div', class_ = 'a815ec762e ab06168e66')div_map = soup.select_one('#hotel_sidebar_static_map')if div_map:foor['x_lnge'] = div_map['data-atlas-latlng']for f in range(0, len(x)):foor[f'feature{f}'] =(x[f].text)data.append(foor)except:None
def general():r = requests.get(urlss)soup = bs(r.text, 'html.parser')x = soup.select('header > a')for f in x:urls = base_url + f['href']obj = {}obj['urls'] = urlsprint(urls)pars(urls)f = []
def export_data(data):f = pd.DataFrame(data)f = f.drop_duplicates()presentday = datetime.now()pese = str(presentday)a = str(presentday)[0:10].replace('-', '_')f.to_excel(f'{a}booking.xlsx', index=False)if __name__ == '__main__':general()export_data(data)
https://en.xdnf.cn/q/118833.html

Related Q&A

How do i change the colour of a button border tkinter

How do i change the colour of a border in tkinterI have looked at other solutions which recommended using highlightcolor and highlightbackground, however these did not work. excercises_button = Button(…

module object has no attribute Gridspec despite calling help(gridspec) revealing the Gridspec class

If I run the python console and doimport matplotlib matplotlib.__version__ import matplotlib.gridspec as gsI see that the matplotlib version is 1.2.1.If I do help(gs) I see the Gridspec class.However t…

Python division doesnt work as expected for large numbers [duplicate]

This question already has answers here:What class to use for money representation?(6 answers)Closed 9 months ago.I have three variables a, b and c. I want to make sure that after doing this: c -= a*bc…

working out an average of the values in a dictionary

My dictionary as of now is like this:class_1 = {Bob:[9,5,4,3,3,4], John:[5,5,7,3,6], Andy:[7,5,6,4,5], Harris:[3,4,2,3,2,3,2]}What i am trying to make it look like is this:class_1_average ={Bob:[averag…

getting an error when trying to import a list into a mysql table

whenever i try to add a list into the mysql table I get an error : ProgrammingError: Not all parameters were used in the SQL statementive tried to look online but all i could found is that i need to us…

Getting a view does not return a valid response error message on my flask chatbot [duplicate]

This question already has answers here:Flask view return error "View function did not return a response"(3 answers)Closed 3 years ago.Trying to create a whatsapp bot on Twilio that limits the…

Django how to add data to Object from queryset

I would like show list of clients and show tags assigned to them but I have problem because I have my tags in other table and I dont know how to connect data together. Clients can have couple of tags o…

before_action ... only: how to do this in python flask? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 5 years ago.Improve…

Destroy function not destroying a frame efficiently after the first iteration in Tkinter Python

I have built a code that saves the calculated data at every iteration in a for loop and the results are stored in 3 different csv files. These saved results are read in another python code that display…

Access columns and rows of numpy.ndarray

I currently struggling with extracting certain columns and rows from a matrix stored as a numpy.ndarray. I have a list in which Ive appended these numpy.ndarrays. This list is stored in a variable name…