So basically I'm trying to scrap the javascript generated data from a website. To do this, I'm using the Python library requests_html.
Here is my code :
from requests_html import HTMLSession
session = HTMLSession()url = 'https://myurl'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
payload = {'mylog': 'root', 'mypass': 'root'}r = session.post(url, headers=headers, verify=False, data=payload)
r.html.render()
load = r.html.find('#load_span', first=True)print (load.text)
If I don't use the render() function, I can connect to the website and my scraped data is null (which is normal) but when I use it, I have this error :
pyppeteer.errors.PageError: net::ERR_CERT_COMMON_NAME_INVALID at https://myurl
or
net::ERR_CERT_WEAK_SIGNATURE_ALGORITHM
I assume the parameter "verify=False" of session.post is ignored by the render. How do I do it ?
Edit : If you want to reproduce the error :
from requests_html import HTMLSession
import requestssession = HTMLSession()url = 'https://wrong.host.badssl.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}r = session.post(url, headers=headers, verify=False)r.html.render()load = r.html.find('#content', first=True)print (load)