How to scrape all p-tag and its corresponding h2-tag with selenium?

2024/11/10 13:21:22

I want to get title and content of article: example web :https://facts.net/best-survival-movies/ I want to append all p in h2[tcontent-title]

enter image description here

and the result expected is:

title=[title1, title2, title3]content = [content1,content2,content3]

and append all p string to content1,and append all p string to content2,and append all p string to content3 can you help me.

Answer

Solution from your last question is not working, cause there are some <p> that are not siblings in the structure, they are nested in an <aside> and the preceding-sibling will fail.

You could switch to preceding only, but this will grab also the <p> from the <aside> - To fix this, simply select the elements more specific:

driver.find_elements(By.CSS_SELECTOR,'.single-title-desc-wrap p:not(aside>p)')

Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())driver.get('https://facts.net/best-survival-movies/')
data = dict((e.text,'') for e in driver.find_elements(By.CSS_SELECTOR,'.single-title-desc-wrap h2'))
for p in driver.find_elements(By.CSS_SELECTOR,'.single-title-desc-wrap p:not(aside>p)'):data[p.find_element(By.XPATH, './preceding-sibling::h2[1]').text] = data[p.find_element(By.XPATH, './preceding-sibling::h2[1]').text]+' '+p.text[{'title':x,'content':y} for x,y in data.items()]
https://en.xdnf.cn/q/119379.html

Related Q&A

Tkinter: Window not showing image

I am new to GUI programming and recently started working with tKinter.My problem is that the program wont show my image, Im suspecing that it is my code that is wrong, however, I would like somone to e…

print dictionary minus two elements

Python 3.6All debug output is from PyCharm 2017.1.2I have a program that gets to this portion of the code:if len(errdict) == 21:for k, v in errdict.items():if k == packets output or bytes:continueprint…

Write CSV file using Python with the help of a csv dictionary / nested csv dictionary

I am having a csv file and i want to write it to another csv file. Its a bit complicated than it seems. Hoping someone to correct my code and rewrite it, so that i can get the desired csvfile. I am usi…

saving data to txt file using python

I am new in python, and I really need some help. I am doing this memory game where I need to save user, game score and time into a text file using python. I have tried several ways to do it, but nothin…

How can I create bounding boxes/contour around the outer object only - Python OpenCV

So Ive been trying to make bounding boxes around a couple of fruits that I made in paint. Im a total beginner to opencv so I watched a couple tutorials and the code that I typed made, makes contours ar…

resuming download file ftp python3.*

There is a file (1-7Gb) that you need to pick up. The network periodically falls, so it is necessary to implement the method of resume. For example, in 1 communication session downloaded 20% the networ…

printing files based on character

I have a directory(data) that contain thousand of files.Each time I want to select three files that are just differ by only one characterAB[C,D,E] and want to perform some computation on the selected t…

Parsing CSV file using Panda

I have been using matplotlib for quite some time now and it is great however, I want to switch to panda and my first attempt at it didnt go so well.My data set looks like this:sam,123,184,2.6,543 winte…

Getting division by zero error with Python and OpenCV

I am using this code to remove the lines from the following image:I dont know the reason, but it gives me as output ZeroDivisionError: division by zero error on line 34 - x0, x1, y0, y1 = (0, im_wb.sha…

Pandas complex calculation based on other columns

I have successfully created new columns based on arithmetic for other columns but now I have a more challenging need to first select elements based on matches of multiple columns then perform math and …