Scraping data from href

2024/11/20 12:28:14

I was trying to get the postcodes for DFS, for that i tried getting the href for each shop and then click on it, the next page has shop location from which i can get the postal code, but i am able to get things working, Where am i going wrong? I tried getting upper level attribute first td.searchResults and then for each of them i am trying to click on href with title DFS and after clicking getting the postalCode. Eventually iterate for all three pages.If there is a better way to do it let me know.

 driver = webdriver.Firefox()driver.get('http://www.localstore.co.uk/stores/75061/dfs/')html = driver.page_sourcesoup = BeautifulSoup(html)listings = soup.select('td.searchResults')for l in listings:while True:      driver.find_element_by_css_selector("a[title*='DFS']").click()shops= {}#info = soup.find('span', itemprop='postalCode').contentshtml = driver.page_sourcesoup = BeautifulSoup(html)info = soup.find(itemprop="postalCode").get_text()shops.append(info)

Update:

driver = webdriver.Firefox()
driver.get('http://www.localstore.co.uk/stores/75061/dfs/')
html = driver.page_source
soup = BeautifulSoup(html)
listings = soup.select('td.searchResults')for l in listings:driver.find_element_by_css_selector("a[title*='DFS']").click()shops = []html = driver.page_sourcesoup = BeautifulSoup(html)info = soup.find_all('span', attrs={"itemprop": "postalCode"})for m in info:if m:m_text = m.get_text()shops.append(m_text)print (shops)
Answer

So after playing with this for a little while, I don't think the best way to do this is with selenium. It would require using driver.back() and waiting for elements to re-appear, and a whole mess of other stuff. I was able to get what you want using just requests, re and bs4. re is included in the Python standard library and if you haven't installed requests, you can do it with pip as follows: pip install requests

from bs4 import BeautifulSoup
import re
import requestsbase_url = 'http://www.localstore.co.uk'
url = 'http://www.localstore.co.uk/stores/75061/dfs/'
res = requests.get(url)
soup = BeautifulSoup(res.text)shops = []links = soup.find_all('a', href=re.compile('.*\/store\/.*'))for l in links:full_link = base_url + l['href']town = l['title'].split(',')[1].strip()res = requests.get(full_link)soup = BeautifulSoup(res.text)info = soup.find('span', attrs={"itemprop": "postalCode"})postalcode = info.textshops.append(dict(town_name=town, postal_code=postalcode))print shops
https://en.xdnf.cn/q/118421.html

Related Q&A

Numpy - how to sort an array of value/key pairs in descending order

I was looking at the problem Fastest way to rank items with multiple values and weightings and came up with the following solution, but with two remaining issues:import numpy as np# set up values keys …

How to extract certain under specific condition in pandas? (Sentimental analysis)

The picture is what my dataframe looks like. I have user_name, movie_name and time column. I want to extract only rows that are first day of certain movie. For example, if movie as first date in the ti…

Flask app.run method does not work with WinPython 3.11.1 and next.js application: fetch failed

When using WinPython 3.10.5 I am able to debug my flask & next.js application using the flask debug mode (to enable hot reloads): app.run(debug=True, host=host, port=port)However, when using WinPyt…

Pythonic way to assign global administrator roles for Azure Active Directory

What specifically needs to be changed in the Python 3 code below in order to successfully assign the Global Administrator role for an Azure Active Directory Tenant to a given service principal? We tri…

Pandas calculating age from a date

I really need help with this one. My previous post was very bad and unclear - Im sorry - I wish I could delete but hopefully this one will be better.I need to calculate the age based off of a date (se…

Create new folders within multiple existing folders with python

I am looking for a way to create new folders within multiple existing folders. For example I have folders a,b,c.. etc and I want to create a new folder inside each of these existing folders and name th…

extract a column from text file

I have a a text file (huge amount of float numbers) with 25 columns. I want to extract column 14 and divide it by column 15. I could not extract this two columns. Codes:with open(sample for north.txt) …

kivy buildozer Compile Error pythonforandroid.toolchain

Compile platformCommand failed: /usr/bin/python3 -m pythonforandroid.toolchain create --dist_name=main -- bootstrap=sdl2 --requirements=kivy,python3 --arch armeabi- v7a --copy-libs --color=always --…

Django Error: No FlatPage matches the given query

SITE_ID = 1and (r, include(django.contrib.flatpages.urls)), is in urls.py.What can I do to fix this error? Django is still displaying this error - I have googled and I cant find anything.File urls.pyf…

I need to automate the filling of a HTML form in a web browser, how?

I am trying to build a python script that captures my screen (a website will be opened), finds the coordinates of a text entry box on the displayed web site, and then clicks in that text entry box. I a…