I'm trying to scrape Gold stock ticker from Yahoo! Finance.
from bs4 import BeautifulSoup
import requests, lxmlresponse = requests.get('https://finance.yahoo.com/quote/GC=F?p=GC=F')
soup = BeautifulSoup(response.text, 'lxml')
gold_price = soup.findAll("div", class_='My(6px) Pos(r) smartphone_Mt(6px)')[2].find_all('p').text
Whenever I run this it returns: list index out of range
.
When I do print(len(ssoup))
it returns 4
.
Any ideas?
Thank you.
You can make a direct request to the yahoo server. To locate the query URL you need to open Network tab via Dev tools (F12) -> Fetch/XHR -> find name: spark?symbols=
(refresh page if you don't see any), find the needed symbol, and see the response (preview tab) on the newly opened tab on the right.
You can make direct requests to all of these links if the request method is GET
since POST
methods are much more complicated.
You need json
and requests
library, no need for bs4
. Note that making a lot of such requests might block your IP (or set an IP rate limit) or you won't get any response because their system might detect that it's a bot since the regular user won't make such requests to the server, repeatedly. So you need to figure out how to bypass it.
Update:
There's possibly a hard limit on how many requests can be made in an X period of time.
Code and example in the online IDE (contains full JSON response):
import requests, jsonresponse = requests.get('https://query1.finance.yahoo.com/v7/finance/spark?symbols=GC%3DF&range=1d&interval=5m&indicators=close&includeTimestamps=false&includePrePost=false&corsDomain=finance.yahoo.com&.tsrc=finance').text
data_1 = json.loads(response)gold_price = data_1['spark']['result'][0]['response'][0]['meta']['previousClose']
print(gold_price)# 1830.8
P.S. There's a blog about scraping Yahoo! Finance Home Page of mine, which is kind of relevant.