No nested nodes. How to get one piece of information and then to get additional info respectively?

2024/10/16 3:28:56

For the code below I need to get dates and their times+hrefs+formats+...(not shown) respectively.

<div class="showtimes"><h2>The Little Prince</h2><div class="poster" data-poster-url="http://www.test.com"><img src="http://www.test.com"></div><div class="showstimes"><div class="date">9 December, Wednesday</div><span class="show-time techno-3d"><a href="http://www.test.com" class="link">12:30</a><span class="show-format">3D</span></span><span class="show-time techno-3d"><a href="http://www.test.com" class="link">15:30</a><span class="show-format">3D</span></span><span class="show-time techno-3d"><a href="http://www.test.com" class="link">18:30</a><span class="show-format">3D</span></span><div class="date">10 December, Thursday</div><span class="show-time techno-2d"><a href="http://www.test.com" class="link">12:30</a><span class="show-format">2D</span>         </span><span class="show-time techno-3d"><a href="http://www.test.com" class="link">15:30</a><span class="show-format">3D</span></span></div>
</div>

To do this, I use this code (python).

for dates in movie.xpath('.//div[@class="showstimes"]/div[@class="date"]'):date = dates.xpath('.//text()')[0]# for times in dates.xpath('//following-sibling::span[1 = count(preceding-sibling::div[1] | (.//div[@class="date"])[1])]'):# for times in dates.xpath('//following-sibling::span[contains(@class,"show-time")]'):# for times in dates.xpath('.//../span[contains(@class,"show-time")]'):# for times in dates.xpath('//following-sibling::span[preceding-sibling::div[1][.="date"]]'):time = times.xpath('.//a/text()')[0]url = times.xpath('.//a/@href')[0]format_type = times.xpath('.//span[@class="show-format"]/text()')[0]

To get dates is not a problem, but I have a problem how to get the rest info for particular date respectively. Tried many different ways - no luck (in comments some of them). I can't find the way how to deal with the case when the nodes that I need are one under another (on the same level?). In this case:

-> div Date1
-> span Time1
-> span href1
-> span Format1-> span Time2
-> span href2
-> span Format2-> span Time3
-> span href3
-> span Format3-> div Date2
-> span Time1
-> span href1
-> span Format1
# etc etc
Answer

Turns out that lxml support referencing python variable from XPath expression, which proven to be useful for this case i.e for every div date, you can get the following sibling span which the nearest preceding sibling div date is the current div date, where reference to the current div date is stored in python variable dates :

for dates in movie.xpath('.//div[@class="showstimes"]/div[@class="date"]'):date = dates.xpath('normalize-space()')for times in dates.xpath('following-sibling::span[preceding-sibling::div[1]=$current]', current=dates):time = times.xpath('a/text()')[0]url = times.xpath('a/@href')[0]format_type = times.xpath('span/text()')[0]print date, time, url, format_type

output :

'9 December, Wednesday', '12:30', 'http://www.test.com', '3D'
'9 December, Wednesday', '15:30', 'http://www.test.com', '3D'
'9 December, Wednesday', '18:30', 'http://www.test.com', '3D'
'10 December, Thursday', '12:30', 'http://www.test.com', '2D'
'10 December, Thursday', '15:30', 'http://www.test.com', '3D'

References :

  • https://stackoverflow.com/a/17750629/2998271
  • http://lxml.de/xpathxslt.html#the-xpath-method
https://en.xdnf.cn/q/117754.html

Related Q&A

I need help changing the color of text in python

Hey I need help with coloring the text in a program I am making. It is a password program and I am trying to make the denied and granted red and green when they appear. Here is the program so far:passw…

How do I use Liclipse to write a ParaView script?

Ive tried following the directions here without success. Here are some of my environment variables:Path: C:\Python34\;C:\Python34\Scripts;...;C:\Program Files (x86)\ParaView 4.3.1\lib\paraview-4.3\site…

List of tuples to nested dictionary without overriding

I need to convert the above list of tuples to nested dictionary without overwriting the value as below in python[(a, 1),(b, true),(b, none),(a, 2),(b, true),(a, 3),(b, false)]{a: {1 : { b : (true,none)…

Rotate matplotlib pyplot with curve by 90 degrees

I have plot with one line as this:import numpy as np import matplotlib.pyplot as pla = np.array([4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9]) b = np.array([i/len(a) for i in range(1,…

Reading csv file and returning as dictionary

Ive written a function that currently reads a file correctly but there are a couple of problems. It needs to be returned as a dictionary where the keys are artist names and the values are lists of tupl…

Spark converting Pandas df to S3

Currently i am using Spark along with Pandas framework. How can I convert Pandas Dataframe in a convenient way which can be written to s3. I have tried below option but I get error as df is Pandas dat…

install jupyter notebook in windows

My Python version is 3.6.0 and my operating system is Windows. I want to install jupyter notebook using the order pip install jupyter. But it failed, I got the following error:

Play a sound using python subprocess and threading

I am trying to open an alert, then loop a sound until the alert is closed. Then the sound should stop.I tried this:import threading import time import subprocessstop_sound = False def play_alarm(file_n…

Referencing `self` in `__old__` in PyContract constraints

Im working on writing some constraints for a class method using PyContract (not PyContracts). As a postcondition, Id like to ensure that the memory address of the instance hasnt changed i.e. id(self) s…

How can I run command in Microsoft Exchange Server Powershell through Python script?

I want to check the number of mailbox in Microsoft Exchange Server. This command works fine in standard cmd.exe:"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -command ". C:…