This is the source code of the html page I want to extract data from.
Webpage: http://gbgfotboll.se/information/?scr=table&ftid=51168 The table is at the bottom of the page
<html><table class="clCommonGrid" cellspacing="0"><thead><tr><td colspan="3">Kommande matcher</td></tr><tr><th style="width:1%;">Tid</th><th style="width:69%;">Match</th><th style="width:30%;">Arena</th></tr></thead><tbody class="clGrid"><tr class="clTrOdd"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-26<!-- br ok --> 19:30</span></span></td><td><a href="?scr=result&fmid=2669197">Guldhedens IK - IF Warta</a></td><td><a href="?scr=venue&faid=847">Guldheden Södra 1 Konstgräs</a> </td></tr><tr class="clTrEven"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-26<!-- br ok --> 13:00</span></span></td><td><a href="?scr=result&fmid=2669176">Romelanda UF - IK Virgo</a></td><td><a href="?scr=venue&faid=941">Romevi 1 Gräs</a> </td></tr><tr class="clTrOdd"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-27<!-- br ok --> 13:00</span></span></td><td><a href="?scr=result&fmid=2669167">Kode IF - IK Kongahälla</a></td><td><a href="?scr=venue&faid=912">Kode IP 1 Gräs</a> </td></tr><tr class="clTrEven"><td nowrap="nowrap" class="no-line-through"><span class="matchTid"><span>2014-09-27<!-- br ok --> 14:00</span></span></td><td><a href="?scr=result&fmid=2669147">Floda BoIF - Partille IF FK </a></td><td><a href="?scr=venue&faid=218">Flodala IP 1</a> </td></tr></tbody></table></html>
Right now i have this code that actually produces the result that i want..
import lxml.html
url = "http://gbgfotboll.se/information/?scr=table&ftid=51168"
html = lxml.html.parse(url)
for i in range(12):xpath1 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[1]/span/span//text()" %(i+1)xpath2 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[2]/a/text()" %(i+1)time = html.xpath(xpath1)[1]date = html.xpath(xpath1)[0]teamName = html.xpath(xpath2)[0]if date == '2014-09-27':print time, teamName
Gives the result:
13:00 Romelanda UF - IK Virgo
13:00 Kode IF - IK Kongahälla
14:00 Floda BoIF - Partille IF FK
Now to the question. I don't want to use for loop with range because its not stable, the rows can change in that table and if it goes out of bounds it will crash. So my question is how can I iterate as I do here in a safe way. Meaning it will iterate through all the rows that are available in the table. No more no less. Also if you have any other suggestion making the code better/faster please go ahead.