For the code below I need to get dates and their times+hrefs+formats+...(not shown) respectively.
<div class="showtimes"><h2>The Little Prince</h2><div class="poster" data-poster-url="http://www.test.com"><img src="http://www.test.com"></div><div class="showstimes"><div class="date">9 December, Wednesday</div><span class="show-time techno-3d"><a href="http://www.test.com" class="link">12:30</a><span class="show-format">3D</span></span><span class="show-time techno-3d"><a href="http://www.test.com" class="link">15:30</a><span class="show-format">3D</span></span><span class="show-time techno-3d"><a href="http://www.test.com" class="link">18:30</a><span class="show-format">3D</span></span><div class="date">10 December, Thursday</div><span class="show-time techno-2d"><a href="http://www.test.com" class="link">12:30</a><span class="show-format">2D</span> </span><span class="show-time techno-3d"><a href="http://www.test.com" class="link">15:30</a><span class="show-format">3D</span></span></div>
</div>
To do this, I use this code (python).
for dates in movie.xpath('.//div[@class="showstimes"]/div[@class="date"]'):date = dates.xpath('.//text()')[0]# for times in dates.xpath('//following-sibling::span[1 = count(preceding-sibling::div[1] | (.//div[@class="date"])[1])]'):# for times in dates.xpath('//following-sibling::span[contains(@class,"show-time")]'):# for times in dates.xpath('.//../span[contains(@class,"show-time")]'):# for times in dates.xpath('//following-sibling::span[preceding-sibling::div[1][.="date"]]'):time = times.xpath('.//a/text()')[0]url = times.xpath('.//a/@href')[0]format_type = times.xpath('.//span[@class="show-format"]/text()')[0]
To get dates is not a problem, but I have a problem how to get the rest info for particular date respectively. Tried many different ways - no luck (in comments some of them). I can't find the way how to deal with the case when the nodes that I need are one under another (on the same level?). In this case:
-> div Date1
-> span Time1
-> span href1
-> span Format1-> span Time2
-> span href2
-> span Format2-> span Time3
-> span href3
-> span Format3-> div Date2
-> span Time1
-> span href1
-> span Format1
# etc etc