Iterate through all the rows in a table using python lxml xpath

2024/9/19 10:07:32

This is the source code of the html page I want to extract data from.

Webpage: http://gbgfotboll.se/information/?scr=table&ftid=51168 The table is at the bottom of the page

     <html><table class="clCommonGrid" cellspacing="0"><thead><tr><td colspan="3">Kommande matcher</td></tr><tr><th style="width:1%;">Tid</th><th style="width:69%;">Match</th><th style="width:30%;">Arena</th></tr></thead><tbody class="clGrid"><tr class="clTrOdd"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-26<!-- br ok --> 19:30</span></span></td><td><a href="?scr=result&amp;fmid=2669197">Guldhedens IK - IF Warta</a></td><td><a href="?scr=venue&amp;faid=847">Guldheden Södra 1 Konstgräs</a> </td></tr><tr class="clTrEven"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-26<!-- br ok --> 13:00</span></span></td><td><a href="?scr=result&amp;fmid=2669176">Romelanda UF - IK Virgo</a></td><td><a href="?scr=venue&amp;faid=941">Romevi 1 Gräs</a> </td></tr><tr class="clTrOdd"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-27<!-- br ok --> 13:00</span></span></td><td><a href="?scr=result&amp;fmid=2669167">Kode IF - IK Kongahälla</a></td><td><a href="?scr=venue&amp;faid=912">Kode IP 1 Gräs</a> </td></tr><tr class="clTrEven"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-27<!-- br ok --> 14:00</span></span></td><td><a href="?scr=result&amp;fmid=2669147">Floda BoIF - Partille IF FK </a></td><td><a href="?scr=venue&amp;faid=218">Flodala IP 1</a> </td></tr></tbody></table></html>

Right now i have this code that actually produces the result that i want..

import lxml.html
url = "http://gbgfotboll.se/information/?scr=table&ftid=51168"
html = lxml.html.parse(url)
for i in range(12):xpath1 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[1]/span/span//text()" %(i+1)xpath2 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[2]/a/text()" %(i+1)time = html.xpath(xpath1)[1]date = html.xpath(xpath1)[0]teamName = html.xpath(xpath2)[0]if date == '2014-09-27':print time, teamName

Gives the result:

13:00 Romelanda UF - IK Virgo

13:00 Kode IF - IK Kongahälla

14:00 Floda BoIF - Partille IF FK

Now to the question. I don't want to use for loop with range because its not stable, the rows can change in that table and if it goes out of bounds it will crash. So my question is how can I iterate as I do here in a safe way. Meaning it will iterate through all the rows that are available in the table. No more no less. Also if you have any other suggestion making the code better/faster please go ahead.

Answer

The following code will iterate whatever the number of rows. The rows_xpath will directly filter on the target date. The xpaths are also created once, outside the for loop, so it should be faster.

import lxml.html
from lxml.etree import XPath
url = "http://gbgfotboll.se/information/?scr=table&ftid=51168"
date = '2014-09-27'rows_xpath = XPath("//*[@id='content-primary']/table[3]/tbody/tr[td[1]/span/span//text()='%s']" % (date))
time_xpath = XPath("td[1]/span/span//text()[2]")
team_xpath = XPath("td[2]/a/text()")html = lxml.html.parse(url)for row in rows_xpath(html):time = time_xpath(row)[0].strip()team = team_xpath(row)[0]print time, team
https://en.xdnf.cn/q/72409.html

Related Q&A

Django: Serializing a list of multiple, chained models

Given two different models, with the same parent base class. Is there any way, using either Django Rest Framework Serializers or serpy, to serialize a chained list containing instances of both the chil…

Formatting cells in Excel with Python

How do I format cells in Excel with python?In particular I need to change the font of several subsequent rows to be regular instead of bold.

What is the legality of scraping YouTube data? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.This question does not appear to be about programming within the scope defined in the help center.Cl…

Numpy: fast calculations considering items neighbors and their position inside the array

I have 4 2D numpy arrays, called a, b, c, d, each of them made of n rows and m columns. What I need to do is giving to each element of b and d a value calculated as follows (pseudo-code):min_coords = m…

How to see all the databases and Tables in Databricks

i want to list all the tables in every database in Azure Databricks. so i want the output to look somewhat like this: Database | Table_name Database1 | Table_1 Database1 | Table_2 Database1 | Table_3 D…

How to get transparent background in window with PyGTK and PyCairo?

Ive been trying really hard to create a window with no decoration and a transparent background using PyGTK. I would then draw the content of the window with Cairo. But I cant get it to work.Ive tried a…

concurrent.futures.ThreadPoolExecutor doesnt print errors

I am trying to use concurrent.futures.ThreadPoolExecutor module to run a class method in parallel, the simplified version of my code is pretty much the following: class TestClass:def __init__(self, sec…

How to write a Dictionary to Excel in Python

I have the following dictionary in python that represents a From - To Distance Matrix.graph = {A:{A:0,B:6,C:INF,D:6,E:7},B:{A:INF,B:0,C:5,D:INF,E:INF},C:{A:INF,B:INF,C:0,D:9,E:3},D:{A:INF,B:INF,C:9,D:0…

How can I check pooled connections in SQLAlchemy before handing them off to my application code?

We have a slightly unreliable database server, for various reasons, and as a consequence sometimes the database connections used by my application vanish out from under it. The connections are SQLAlch…

pandas list of dictionary to separate columns

I have a data set like below:name status number message matt active 12345 [job: , money: none, wife: none] james active 23456 [group: band, wife: yes, money: 10000] adam in…