cnn news webscraper return empty [] without information

2024/10/6 16:16:31

so i wrote this code for now:

from urllib import request
from bs4 import BeautifulSoup
import requests
import csv
import reserch_term = input('What News are you looking for today? ')url = f'https://edition.cnn.com/search?q={serch_term}'
page = requests.get(url).text
doc = BeautifulSoup(page, "html.parser")page_text = doc.find_all('<h3 class="cnn-search__result-headline">')
print(page_text)

but im getting empty [] as an result if i print(page_text) does someone can help me

Answer

There are several issues:

  • content is provided dynamically by JavaScript, so you wont get it with requests

  • We do not know your search term, maybe there are no results

  • BeautifulSoup is not working with something like <h3 class="cnn-search__result-headline"> as selection.

How to fix? Use selenium that works like a browser, renders also JavaScript and could provide you the page_source as expected.

Example

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Serviceservice = Service(executable_path='YOUR PATH TO CHROMEDRIVER')
driver = webdriver.Chrome(service=service)
driver.get('https://edition.cnn.com/search?q=python')soup = BeautifulSoup(driver.page_source,'html.parser' )
soup.select('h3.cnn-search__result-headline')

Output

[<h3 class="cnn-search__result-headline"><a href="//www.cnn.com/travel/article/airasia-malaysia-snake-plane-rerouted-intl-hnk/index.html">AirAsia flight in Malaysia rerouted after snake found on board plane</a></h3>,<h3 class="cnn-search__result-headline"><a href="//www.cnn.com/2021/11/19/cnn-underscored/athleta-gift-shop-holiday/index.html">With gift options under $50 plus splurge-worthy seasonal staples, Athleta's Gift Shop is a holiday shopping haven</a></h3>,...]

To get the title call the .text methode while iterating your ResultSet and to grab the value of href use ['href'] on its contained <a>

https://en.xdnf.cn/q/118935.html

Related Q&A

Why the code shows all the addition process?

Code: sum=0 for i in range(10,91):sum=sum+iprint(sum)When I wrote this code, the answer was Output: 10 21 33 46 60 75 91 108 126 145 165 186 208 231 255 280 306 333 361 390 420 451 483 516 550 585 621 …

Creating a list of keywords by scrolling through a dataframe (python)

I have a dataframe that looks like this: dataFrame = pd.DataFrame({Name: ((" Verbundmrtel , Compound Mortar , Malta per stucchi e per incollaggio "),(" StoLevell In Absolute , StoLeve…

How to click this button with python selenium

Im looking to click the button highlighted in the screenshot below; have tried with pyautogui but found results to be inconsistent so trying selenium instead.Im having trouble identifying the button to…

How to find the average of numbers being input, with 0 breaking the loop?

I just need to figure out how to find the average of all these input numbers by the user while using 0 as a exit of the loop. I need to figure out how to eliminate using 0 as part of the average. examp…

NoSuchElementException when loading code using Selenium on Heroku

Error: ERROR:asyncio:Task exception was never retrieved2022-03-14T14:08:52.425684+00:00 app[worker.1]: future: <Task finished name=Task-30 coro=<Dispatcher._process_polling_updates() done, define…

Python alphanumeric

Problem:I have to go through text file that has lines of strings and determine about each line if it is alphanumeric or not. If the line is alphanumeric print for example "5345m34534l is alphanume…

Python 3 - exec() Vs eval() - Expression evaluation [duplicate]

This question already has answers here:Whats the difference between eval, exec, and compile?(3 answers)Closed 7 years ago.After reading query.below python code is still not clear,>>> exec(pri…

What is the syntax for printing multiple data types in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 8 years ago.Improve…

When scraping all the div to get the data getting the null list using lxml in python

I want to scrape the product title , product link , product price but when I am using the xpath it is showing the null list . How to add the xpath and for loop to get the above details . I have tried …

Python how to convert this for loop into a while loop [duplicate]

This question already has answers here:Closed 11 years ago.Possible Duplicate:Converting a for loop to a while loop I have this for a for loop which I made I was wondering how I would write so it woul…