Scraping dynamic webpage using Python

2024/7/5 11:41:22

I am trying to scrape following dynamically generated webpage https://www.governmentjobs.com/careers/capecoral?page=1 I've used requests, scrapy, scrapy-splash but I simply get page source code and I don't get any job listing.

import requests
from bs4 import BeautifulSoup`
r = requests.get("https://www.governmentjobs.com/careers/capecoral?page=1")
soup = BeautifulSoup(r.content)
n_jobs = soup.select("#number-found-items")[0].text.strip()
print(n_jobs)

It always returns 0 jobs found

Answer

As the url is dynamic that's why you can use selenium with bs4 to get the desired data. Here is an example.Please, just run the code.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManagerurl = "https://www.governmentjobs.com/careers/capecoral?page=1"driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)soup = BeautifulSoup(driver.page_source, 'lxml')for title in soup.select('.list-item h3 > a'):print(title.text)

Output:

Assistant City Attorney / City Attorney's Office
Business Applications Analyst II / Information Technology Services #6425
Contract Athletic Official / Athletics / Parks & Recreation #6237
Contract Background Investigation Specialist / Investigations / Police Dept.  #6514
Contract Beverage Cart/Waiter/Waitress / Parks and Recreation / Coral Oaks #6479
Contract Counselor / Youth Center / Parks & Recreation #6317
Contract Counselor/Instructor / Parks & Recreation / Special Populations #6339
Contract Custodial Worker / Lake Kennedy / Parks & Recreation #6525
Contract Custodial Worker /Parks & Recreation / Yacht Club #6312
Contract Golf Course Outside Operations / Parks & Recreation / Coral Oaks  #6535
https://en.xdnf.cn/q/120307.html

Related Q&A

numba cuda deprecation error : how to update my code?

Im running a jupyter notebook frome here : https://github.com/noahgift/nuclear_powered_command_line_tools/blob/master/notebooks/numba-cuda.ipynb The docs of current numba/cuda is here : https://numba.r…

reverse nested dicts using python

I already referred these posts here, here and here. I have a sample dict like as shown below t = {thisdict:{"brand": "Ford","model": "Mustang","year": …

python how to generate permutations of putting a singular character into a word

No idea how to word this so the title sucks my bad, Basically, I have a 4 letter word and I want to generate every permutation of putting a dash in it. So if my word was Cats, I want to get every permu…

Selenium Scraping Javascript Table

I am stuggling to scrape as per code below. Would apprciate it if someone can have a look at what I am missing? Regards PyProg70from selenium import webdriver from selenium.webdriver import FirefoxOp…

PYTHON REGEXP to replace recognized pattern with the pattern itself and the replacement?

Text- .1. This is just awesome.2. Google just ruined Apple.3. Apple ruined itself! pattern = (dot)(number)(dot)(singlespace)Imagine you have 30 to 40 sentences with paragraph numbers in the above patt…

How can I extract the text between a/a? [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

How do I access classes and get a dir() of available actions?

I have been trying to get access to available functions for a Match Object from re.search. I am looking for a way to do that similar to how I could do dir(str) and I can find .replace.This is my dir() …

Python - IndexError: list index out of range

Why would data[entities][urls][0][expanded_url] would produce IndexError: list index out of range error? I understand what this error means but cant see why? perhaps too sleepy at 2 am? Please helpd…

Python: Use Regular expression to remove something

Ive got a string looks like thisABC(a =2,b=3,c=5,d=5,e=Something)I want the result to be likeABC(a =2,b=3,c=5)Whats the best way to do this? I prefer to use regular expression in Python.Sorry, somethi…

Python delete row in file after reading it

I python 2.7 I am reading data from file in while loop. When I successfully read row, I would like to delete this row from a file, but I dont know how to do it - Efficient way so i dont waste to much o…