How do I pull multiple values from html page using python?

2024/7/5 12:13:16

I'm performing some data analysis for my own knowledge from nhl spread/betting odds information. I'm able to pull some information, but Not the entire data set. I want to pull the list of games and the associated into a panda dataframe, but I have been able to perform the proper loop around the html tags. I've tried the findAll option and the xpath route. I'm not successful with either.

from bs4 import BeautifulSoup
import requestspage_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'page_response = requests.get(page_link, timeout=5)# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")# Take out the <div> of name and get its value
name_box = page_content.find('div', attrs={'class': 'datarow'})
name = name_box.text.strip()print (name)
Answer

This script goes through each datarow and pulls out each item individually and then appends them into a pandas DataFrame.

from bs4 import BeautifulSoup
import requests
import pandas as pdpage_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'page_response = requests.get(page_link, timeout=5)# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")# Take out the <div> of name and get its value
tables = page_content.find_all('div', class_='datarow')# Iterate through rows
rows = []# Iterate through each datarow and pull out each home/away separately
for table in tables:# Get time and datetime_and_date_tag = table.find_all('div', attrs={"class": "time"})[0].contentsdate = time_and_date_tag[1]time = time_and_date_tag[-1]# Get teamsteams_tag = table.find_all('div', attrs={"class": "datacell teams"})[0].contents[-1].contentshome_team = teams_tag[1].textaway_team = teams_tag[-1].text# Get openingopening_tag = table.find_all('div', attrs={"class": "child-open"})[0].contentshome_open_value = opening_tag[1]away_open_value = opening_tag[-1]# Get currentcurrent_tag = table.find_all('div', attrs={"class": "child-current"})[0].contentshome_current_value = current_tag[1]away_current_value = current_tag[-1]# Create listrows.append([time, date, home_team, away_team,home_open_value, away_open_value,home_current_value, away_current_value])columns = ['time', 'date', 'home_team', 'away_team','home_open', 'away_open','home_current', 'away_current']print(pd.DataFrame(rows, columns=columns))
https://en.xdnf.cn/q/119576.html

Related Q&A

Creating h5 file for storing a dataset to train super resolution GAN

I am trying to create a h5 file for storing a dataset for training a super resolution GAN. Where each training pair would be a Low resolution and a High resolution image. The dataset will contain the d…

How to resolve wide_to_long error in pandas

I have following dataframeAnd I want to convert it into the following format:-To do so I have used the following code snippet:-df = pd.wide_to_long(df, stubnames=[manufacturing_unit_,outlet_,inventory,…

Odoo 10: enter value in Many2one field dynamically

I added in my models.py :commercial_group = fields.Many2one("simcard.simcard")and in my views.xml :<field name="commercial_group" widget="selection"/>And then i am t…

How to erode this thresholded image using OpenCV

I am trying to first remove the captcha numbers by thresholding and then eroding it ,to get slim continuous lines to get better output. Problem:the eroded image is not continuous as u can see Original …

Searching for only the first value in an array in a csv file

So i am creating a account login system which searches a database for a username (and its relevant password) and, if found, will log the user on.This is what the csv file currently looks like[dom, ente…

how to write a single row cell by cell and fill it in csv file

I have a CSV file that only has column headers:cat mycsv.csvcol_1@@@col_2@@@col_3@@@col_3I have to fill a single row with None values in each cell of the CSV file. Can someone suggest me the best-optim…

Greedy String Tiling in Python

I am trying to learn greedy string tiling in algorithmI have two lists as follows:a=[a,b,c,d,e,f] b=[d,e,a,b,c,f]i would like to retrieve c=[a,b,c,d,e]Another example would be a = [1,2,3,4,5,6,7,8,9,1,…

Python - efficient way to create 20 variables?

I need to create 20 variables in Python. That variables are all needed, they should initially be empty strings and the empty strings will later be replaced with other strings. I cann not create the var…

Whatsapp asking for updating chrome version

I am trying to open whatsapp with selenium and python, it was working fine until today. In headless or non, whatsapp is now asking to update chrome, when I try to do so, Chrome throws this error: An er…

how to find the longest N words from a list, using python?

I am now studying Python, and I am trying to solve the following exercise:Assuming there is a list of words in a text file, My goal is to print the longest N words in this list.Where there are several …