Python extracting element using bs4, very basic thing I think I dont understand

2024/10/5 16:22:51

So I'm using Beautiful Soup to try to get an element off of a page using the tag and class. Here is my code:

import requests
from bs4 import BeautifulSoup# Send a GET request to the webpage
url = "https://www.hindawi.com/journals/am/2021/1623076/"
response = requests.get(url)# Parse the HTML content of the webpage
soup = BeautifulSoup(response.text, 'html.parser')results = soup.find_all('span', class_ = 'simpleShowMore')
print(results)

which I pretty much took directly from their example. If you look at this website, there are values there and it's not being found by BS. The site looks like this:

Trying to extract this string

The output of this is:

[]

I'm sure I am doing something very simple wrong. I believe a lot of the examples I found are out of date. Please help?

Thanks

Answer

I assume that you would like to obtain the address of all institutions.

If you analyze the source code of the website, you can see that the information is, contradictory to the other answers and comments, already loaded and will be displayed by a script afterwards. Since this is the case, you do not need to use Selenium or any other tool to perform the click to load the data.

Here is an example of how you can access the information:

from bs4 import BeautifulSoup
import requests
import json# Send a GET request to the webpage
url = "https://www.hindawi.com/journals/am/2021/1623076/"
response = requests.get(url)# Parse the HTML content of the webpage
soup = BeautifulSoup(response.text, 'html.parser')# Extract text of the element containing the data that could get loaded
new_data = soup.find(attrs={'id': '__NEXT_DATA__'}).text#
data = json.loads(new_data)# Extract Data and transform it into an array
join_address_lines = lambda entry: ', '.join([line['addrLine1'] + ', ' + line['addrLine2'] + ', ' + line['addrLine3'] for line in entry['addrLines']])
address_lines_array = list(map(join_address_lines, data['props']['pageProps']['article']['affiliations']))# Printing the results
print(address_lines_array)
# --> ['School of Architecture and Materials, Chongqing College of Electronic Engineering, Chongqing 401331', 'College of Geography and Tourism, Chongqing Normal University, Chongqing 401331']
https://en.xdnf.cn/q/119672.html

Related Q&A

Why Isnt my Gmail Account Bruteforcer Working? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.Improve…

Python: Split Start and End Date into All Days Between Start and End Date

Ive got data called Planned Leave which includes Start Date, End Date, User ID and Leave Type.I want to be able to create a new data-frame which shows all days between Start and End Date, per User ID.S…

Python and java AES/ECB/PKCS5 encryption

JAVA VERSION:public class EncryptUtil {public static String AESEncode(String encodeRules, String content) {try {KeyGenerator keygen = KeyGenerator.getInstance("AES");keygen.init(128, new Secu…

How to find the center point of this rectangle

I am trying to find the center point of the green rectangle which is behind the fish, but my approach is not working. Here is my code:#Finding contours (almost always finds those 2 retangles + some noi…

Simple Battleships game implementation in Python

Okay Im not sure how to develop another board with hidden spaces for the computers ships per-se, and have it test for hits. Again Im not even sure how Im going to test for hits on the board I have now.…

How to remove WindowsPath and parantheses from a string [duplicate]

This question already has an answer here:Reference - What does this regex mean?(1 answer)Closed 4 years ago.I need to remove WindowsPath( and some of the closing parentheses ) from a directory string.…

How to escape escape-characters

I have a string variable which is not printing properly, I guess because it contains escape characters. How to escape these escape-characters?>>> print "[p{Aa}\\P{InBasic_Latin}\r\t\n]&q…

Python Multiprocessing a large dataframe on Linux

As shown in the title, I have a big data frame (df) that needs to be processed row-wise, as df is big (6 GB), I want to utilize the multiprocessing package of python to speed it up, below is a toy exam…

How to pass more arguments through tkinter bind

How do I pass more arguments through tkinters bind method? for the example:tk = Tk() def moveShip(event,key):if event.keysym == Down and player1.selectedCoord[1] != 9:if key == place:player1.selectedC…

Python Class method definition : unexpected indent [duplicate]

This question already has answers here:Im getting an IndentationError (or a TabError). How do I fix it?(6 answers)Closed 6 months ago.I am getting started with Django and Python so naturally Im doing …