Want to scrape all the specific href from the a tag

2024/10/6 9:14:12

I have search the specific brand Samsung , for this number of products are search ,I just wanted to scrape all the href from the of the search products with the product name .

enter code here
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
chrome_path =r'C:/Users/91940/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe'
driver = webdriver.Chrome(executable_path=chrome_path)
driver.implicitly_wait(10)
url = "https://www.lazada.sg"
driver.get(url)
driver.maximize_window()
soup=BeautifulSoup(driver.page_source, 'lxml')
application = driver.find_element_by_id("q")
application.send_keys("Samsung")
driver.find_element_by_css_selector(".search-box__button--1oH7").click()div = driver.find_elements_by_tag_name('div', {'class': 'GridItem__title___8JShU'})
print(len(div))
for ele in div :print(a.get_attribute("href")
Answer

Couple of things. You are trying to mix bs4 syntax with selenium which is causing your current error. Additionally, you are targeting potentially dynamic values. Finally, there are anti-scraping measures which may later impact on your work.

Ignoring the last, a more robust, syntax appropriate version, might be:

div = driver.find_elements_by_css_selector('[data-tracking="product-card"]')
links = [i.find_element_by_css_selector('[age="0"]').get_attribute('href') for i in div]
print(links)

You could reduce this just to a list comprehension with a different css selector combination e.g.:

links = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')]

For that last one, you can return dict with product name as follows:

{i.find_element_by_tag_name('img').get_attribute('alt'):i.get_attribute('href') for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')}

As a dataframe:

import pandas as pdpd.DataFrame([(i.find_element_by_tag_name('img').get_attribute('alt'), i.get_attribute('href')) for i in driver.find_elements_by_css_selector('[data-tracking="product-card"] div:nth-child(1) > [href*=search]')], columns = ['Title', 'Link'])
https://en.xdnf.cn/q/119517.html

Related Q&A

Encryption code in def function to be written in python

need some help in the following code as it goes into infinite loop and does not validate user input: the get_offset is the function. Just edited need some help with the encryption part to be done in a …

Creating xml from MySQL query with Python and lxml

I am trying to use Python and LXML to create an XML file from a Mysql query result. Here is the format I want.<DATA><ROW><FIELD1>content</FIELD1><FIELD2>content</FIELD2…

How to add another iterator to nested loop in python without additional loop?

I am trying to add a date to my nested loop without creating another loop. End is my list of dates and end(len) is equal to len(year). Alternatively I can add the date to the dataframe (data1) is that …

How to know where the arrow ends in matplotlib quiver

I have programmed plt.quiver(x,y,u,v,color), where there are arrows that start at x,y and the direction is determined by u,v. My question is how can I know exactly where the arrow ends?

how send text(sendkey) to ckeditor in selenium python scripting -chrome driver

I cant send a text to the text box of CKEditor while I scripting.it not shown in the seleniumIDE recording also.Help me to fix this issueASAP

How to replace the column of dataframe based on priority order?

I have a dataframe as follows df["Annotations"] missense_variant&splice_region_variant stop_gained&splice_region_variant splice_acceptor_variant&coding_sequence_variant&intron…

Scraping from web page and reformatting to a calender file

Im trying to scrape this site: http://stats.swehockey.se/ScheduleAndResults/Schedule/3940And Ive gotten as far (thanks to alecxe) as retrieving the date and teams.from scrapy.item import Item, Field fr…

Python Text to Data Frame with Specific Pattern

I am trying to convert a bunch of text files into a data frame using Pandas. Thanks to Stack Overflows amazing community, I almost got the desired output (OP: Python Text File to Data Frame with Specif…

Python Multiprocess OpenCV Webcam Get Request [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

error when trying to run my tensorflow code

This is a follow up question from my latest post: Put input in a tensorflow neural network I precoded a neural network using tensorflow with the MNIST dataset, and with the help of @FinnE was able to c…