How do I pull multiple values from html page using python?

2024/7/5 12:13:16

I'm performing some data analysis for my own knowledge from nhl spread/betting odds information. I'm able to pull some information, but Not the entire data set. I want to pull the list of games and the associated into a panda dataframe, but I have been able to perform the proper loop around the html tags. I've tried the findAll option and the xpath route. I'm not successful with either.

from bs4 import BeautifulSoup
import requestspage_link = ''page_response = requests.get(page_link, timeout=5)# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")# Take out the <div> of name and get its value
name_box = page_content.find('div', attrs={'class': 'datarow'})
name = name_box.text.strip()print (name)

This script goes through each datarow and pulls out each item individually and then appends them into a pandas DataFrame.

from bs4 import BeautifulSoup
import requests
import pandas as pdpage_link = ''page_response = requests.get(page_link, timeout=5)# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")# Take out the <div> of name and get its value
tables = page_content.find_all('div', class_='datarow')# Iterate through rows
rows = []# Iterate through each datarow and pull out each home/away separately
for table in tables:# Get time and datetime_and_date_tag = table.find_all('div', attrs={"class": "time"})[0].contentsdate = time_and_date_tag[1]time = time_and_date_tag[-1]# Get teamsteams_tag = table.find_all('div', attrs={"class": "datacell teams"})[0].contents[-1].contentshome_team = teams_tag[1].textaway_team = teams_tag[-1].text# Get openingopening_tag = table.find_all('div', attrs={"class": "child-open"})[0].contentshome_open_value = opening_tag[1]away_open_value = opening_tag[-1]# Get currentcurrent_tag = table.find_all('div', attrs={"class": "child-current"})[0].contentshome_current_value = current_tag[1]away_current_value = current_tag[-1]# Create listrows.append([time, date, home_team, away_team,home_open_value, away_open_value,home_current_value, away_current_value])columns = ['time', 'date', 'home_team', 'away_team','home_open', 'away_open','home_current', 'away_current']print(pd.DataFrame(rows, columns=columns))

