Scraping a specific website with a search box and javascripts in Python

2024/9/21 4:35:43

On the website https://sray.arabesque.com/dashboard there is a search box "input" in html. I want to enter a company name in the search box, choose the first suggestion for that name in the dropout menu (e.g., "Anglo American plc"), go to the url with the info about that company, load javascripts to get full html version of the obtained page, and then scrape it for GC Score, ESG Score, Temperature Score in the bottom.

!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install seleniumfrom selenium import webdriver
from selenium.webdriver.common.keys import Keys
options = webdriver.ChromeOptions()
options.add_argument('-headless')
options.add_argument('-no-sandbox')
options.add_argument('-disable-dev-shm-usage')wd = webdriver.Chrome('chromedriver',options=options)companies = ['Anglo American plc']for company in companies:# dryscrape.start_xvfb()# session = dryscrape.Session()# session.visit("https://srayapi.arabesque.com/api/sray/company/history/004BTP-E")resp = wd.get('https://sray.arabesque.com/dashboard/')
#print(driver.page_source)e = wd.find_element_by_id(id_='mat-input-0')e.send_keys(company)e.send_keys(Keys.ENTER)innerHTML = e.execute_script("return document.body.innerHTML")print(innerHTML)

I don't quite understand how to visit an URL with info about Anglo American and scrape it if we don't know the URL after entering the company name in the search box.

Answer

You can do that using selenium.Couple of things you need to update.

While interacting headless you need to provide window size.

Induce WebDriverWait() to avoid synchronization issue.

Code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import Byoptions = webdriver.ChromeOptions()
options.add_argument('-headless')
options.add_argument('-no-sandbox')
options.add_argument('-disable-dev-shm-usage')
options.add_argument('window-size=1920,1080')wd = webdriver.Chrome(options=options)companies = ['Anglo American plc']for company in companies:wd.get('https://sray.arabesque.com/dashboard/')WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[text()='list']"))).click()WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='mat-input-0']"))).send_keys(company)WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[contains(.,' Anglo American plc ')]"))).click()WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, "(//span[normalize-space(.)='Open dashboard'])[1]"))).click()WebDriverWait(wd,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.mat-tab-labels")))print(wd.find_element_by_xpath("//div[@class='mat-tab-label-content'][contains(.,'GC Score')]/span").text)print(wd.find_element_by_xpath("//div[@class='mat-tab-label-content'][contains(.,'ESG Score')]/span").text)print(wd.find_element_by_xpath("//div[@class='mat-tab-label-content'][contains(.,'Temp')]/span").text)

Output:

57.03
53.78
2.7°C
https://en.xdnf.cn/q/119240.html

Related Q&A

Uppercasing letters after ., ! and ? signs in Python

I have been searching Stack Overflow but cannot find the proper code for correcting e.g."hello! are you tired? no, not at all!"Into:"Hello! Are you tired? No, not at all!"

Why does list() function is not letting me change the list [duplicate]

This question already has answers here:How do I clone a list so that it doesnt change unexpectedly after assignment?(24 answers)Python pass by value with nested lists?(1 answer)Closed 2 years ago.If …

How can I explicitly see what self does in python?

Ive read somewhere that the use of ‘self’ in Python converts myobject.method (arg1, arg2) into MyClass.method(myobject, arg1, arg2). Does anyone know how I can prove this? Is it only possible if I…

Recieve global variable (Cython)

I am using Cython in jupyter notebook. As I know, Cython compiles def functions.But when I want to call function with global variable it doesnt see it. Are there any method to call function with variab…

Counting elements in specified column of a .csv file

I am programming in Python I want to count how many times each word appears in a column. Coulmn 4 of my .csv file contains cca. 7 different words and need to know how many times each one appears. Eg. t…

Why does genexp(generator expression) is called genexp? not iterexp?

A generator is a special kind of iterator, and it has some methods that an normal iterator doesnt have such as send(), close()... etc. One can get a generator by using a genexp like below:g=(i for i in…

why does no picture show

from matplotlib.backends.backend_qt4agg import FigureCanvasQTAgg if __name__ == "__main__":fig1 = ...print("start plotting")canvas = FigureCanvasQTAgg(fig1)canvas.draw()canvas.show(…

How Normalize Data Mining Min Max from Mysql in Python

This is example of my data in mysql, I use lib flashext.mysql and python 3RT NK NB SU SK P TNI IK IB TARGET 84876 902 1192 2098 3623 169 39 133 1063 94095 79194 …

complex json file to csv in python

I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them t…

python pygame - how to create a drag and drop with multiple images?

So Ive been trying to create a jigsaw puzzle using pygame in python.The only problem is that Im having trouble creating the board with multiple images that i can drag along the screen (no need to conne…