I am scraping athletic.net, a website that stores track and field times. So far I have printed event titles and times, but my output contains all times from that season rather than only times for that specific event. I am using a for loop with an arbitrary number of loops, but instead I would like to find_next_sibling() until that sibling is an h5 tag, because h5 tags are the titles of each event. In short, how can I stop my for loop when find_next_sibling is an h5 tag? I think this should be a simple while loop, but I have struggled to implement it.
for text in soup.find_all('h5'):if "Season" in str(text):text_file.write(('\n' + '\n' + str(text.contents[0])) + '\n')else:text_file.write(str(text.contents[0]) + '\n')block = ""for i in range(0,100):try:text = text.find_next_sibling()block = block + str(text) + '\n'except:print("miss")soupBlock = BeautifulSoup(block)for t in soupBlock.select('tr td:nth-of-type(2) [href^="/result"]'):text_file.write(str(t.contents[0]) + '\n')
Output:
2021 Outdoor Season 800 Meters
2:14.81
2:12.32
4:43.62
4:44.21
4:42.11
10:26.85
10:09.89
10:21.49
1600 Meters
4:43.62
4:44.21
4:42.11
10:26.85
10:09.89
10:21.49
3200 Meters
10:26.85
10:09.89
10:21.49
Desired output:
2021 Outdoor Season 800 Meters
2:14.81
2:12.32
1600 Meters
4:43.62
4:44.21
4:42.11
3200 Meters
10:26.85
10:09.89
10:21.49