Exporting DataFrame to Excel using pandas without subscribe

2024/11/19 17:43:52

How can I export DataFrame to excel without subscribe? For exemple: I'm doing webscraping and there is a table with pagination, so I take the page 1 save it in DataFrame, export to excel e do it again in page 2. But every record is erased when a save it remaining the last one. Sorry for my english, here is my code:

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriveri=1
url = "https://stats.nba.com/players/traditional/?PerMode=Totals&Season=2019-20&SeasonType=Regular%20Season&sort=PLAYER_NAME&dir=-1"driver = webdriver.Firefox(executable_path=r'C:/Users/Fabio\Desktop/robo/geckodriver.exe')driver.get(url)
time.sleep(5)driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]/div[1]/table/thead/tr/th[9]").click()contador = 1#loop pagination
while(contador < 4):#findind tableelemento = driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]")html_content = elemento.get_attribute('outerHTML')# 2. Parse HTML - BeaultifulSoupsoup = BeautifulSoup(html_content, 'html.parser')table = soup.find(name='table')# 3. Data Frame - Pandasdf_full = pd.read_html(str(table))[0]df = df_full[['PLAYER','TEAM', 'PTS']]df.columns = ['jogador','time', 'pontuacao']dados1 = pd.DataFrame(df)driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div/div/a[2]").click()contador = contador + 1#4. export to exceldados = pd.DataFrame(df)
dados.to_excel("fabinho.xlsx")driver.quit()

Answer

You are re-assigning df to whatever data you retrieved everytime you go through the loop. A solution would be to append the data to a list and then pd.concat the list at the end.

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriveri=1
url = "https://stats.nba.com/players/traditional/?PerMode=Totals&Season=2019-20&SeasonType=Regular%20Season&sort=PLAYER_NAME&dir=-1"driver = webdriver.Firefox(executable_path=r'C:/Users/Fabio\Desktop/robo/geckodriver.exe')driver.get(url)
time.sleep(5)driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]/div[1]/table/thead/tr/th[9]").click()contador = 1
df_list = list()
#loop pagination
while(contador < 4):#findind tableelemento = driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]")html_content = elemento.get_attribute('outerHTML')# 2. Parse HTML - BeaultifulSoupsoup = BeautifulSoup(html_content, 'html.parser')table = soup.find(name='table')# 3. Data Frame - Pandasdf_full = pd.read_html(str(table))[0]df = df_full[['PLAYER','TEAM', 'PTS']]df.columns = ['jogador','time', 'pontuacao']df_list.append(df)driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div/div/a[2]").click()contador = contador + 1#4. export to exceldados = pd.concat(df_list)
dados.to_excel("fabinho.xlsx")driver.quit()

https://en.xdnf.cn/q/118516.html

Related Q&A

Fraction of a real number in python giving complicated answer

Importing Fraction from fractions to give a fractional representation of a real number, but giving responses quite complicated which seems very simple by the paper-pen method. Fractions(.2) giving answ…

Scrape latitude and longitude (Google Maps) inside Script type=text/javascript

Im beginner in Web Scrapping. Im trying to get latitude and longitude from this web: https://urbania.pe/inmueble/proyecto/ememhvin-proyecto-mariscal-castilla-lima-santiago-de-surco-tale-inmobiliaria-65…

How to delete a button that is made by a loop

from tkinter import *class Main:def __init__(self, root):for i in range(0, 9):for k in range(0, 9):Button(root, text=" ").grid(row=i, column=k)root.mainloop()root = Tk()x = Main(root)How do I…

Invalid array shape with neural network using Keras?

Currently studying the Deep Learning with Python book by Francios Chollet. I am very new to this and I am getting this error code despite following his code verbatim. Can anyone interpret the error mes…

How to download PDF files from a list of URLs in Python?

I have a big list of links to PDF files that I need to download (500+) and I was trying to make a program to download them all because I dont want to manually do them. This is what I have and when I tr…

Training on GPU much slower than on CPU - why and how to speed it up?

I am training a Convolutional Neural Network using Google Colabs CPU and GPU. This is the architecture of the network: Model: "sequential" ____________________________________________________…

Check list item is present in Dictionary

Im trying to extend Python - Iterate thru month dates and print a custom output and add an addtional functionality to check if a date in the given date range is national holiday, print "NH" a…

a list of identical elements in the merge list

I need to merge the list and have a function that can be implemented, but when the number of merges is very slow and unbearable, I wonder if there is a more efficient way Consolidation conditions:Sub-…

How To Get A Contour Of More/Less Of The Expected Area In OpenCV Python

I doing some contour detection on a image and i want to find a contour based on a area that i will fix in this case i want the contour marked in red. So i want a bounding box around the red contour Fol…

Storing output of SQL Query in Python Variable

With reference to this, I tried modifying my SQL query as follows:query2 ="""insert into table xyz(select * from abc where date_time > %s and date_time <= ( %s + interval 1 hour))&…