How to ignore an invalid SSL certificate with requests_html?

2024/11/16 18:00:51

So basically I'm trying to scrap the javascript generated data from a website. To do this, I'm using the Python library requests_html.

Here is my code :

from requests_html import HTMLSession
session = HTMLSession()url = 'https://myurl'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
payload = {'mylog': 'root', 'mypass': 'root'}r = session.post(url, headers=headers, verify=False, data=payload)
r.html.render()
load = r.html.find('#load_span', first=True)print (load.text)  

If I don't use the render() function, I can connect to the website and my scraped data is null (which is normal) but when I use it, I have this error :

pyppeteer.errors.PageError: net::ERR_CERT_COMMON_NAME_INVALID at https://myurl

or

net::ERR_CERT_WEAK_SIGNATURE_ALGORITHM

I assume the parameter "verify=False" of session.post is ignored by the render. How do I do it ?

Edit : If you want to reproduce the error :

from requests_html import HTMLSession
import requestssession = HTMLSession()url = 'https://wrong.host.badssl.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}r = session.post(url, headers=headers, verify=False)r.html.render()load = r.html.find('#content', first=True)print (load)
Answer

The only way is to set the ignoreHTTPSErrors parameter in pyppeteer. The problem is that requests_html doesn't provide any way to set this parameter, in fact, there is an issue about it. My advice is to ping again the developers by adding another message here.

Or maybe you can pull this new feature.

Another way is to use Selenium.

EDIT:
I added verify=False as a feature with a pull request (accepted). Now is possible to ignore the SSL error :)

It's not a parameter of the Get() set it when you instantiate the object:

session = HTMLSession(verify=False)
https://en.xdnf.cn/q/71645.html

Related Q&A

Fabric asks for root password

I am using Fabric to run the following:def staging():""" use staging environment on remote host"""env.user = ubuntuenv.environment = stagingenv.hosts = [host.dev]_setup_pa…

Beautifulsoup results to pandas dataframe

The below code returns me a table with the following resultsr = requests.get(url) soup = bs4.BeautifulSoup(r.text, lxml)mylist = soup.find(attrs={class: table_grey_border}) print(mylist)results - it st…

XGBoost CV and best iteration

I am using XGBoost cv to find the optimal number of rounds for my model. I would be very grateful if someone could confirm (or refute), the optimal number of rounds is: estop = 40res = xgb.cv(params, d…

Whats the correct way to implement a metaclass with a different signature than `type`?

Say I want to implement a metaclass that should serve as a class factory. But unlike the type constructor, which takes 3 arguments, my metaclass should be callable without any arguments:Cls1 = MyMeta()…

Python -- Regex -- How to find a string between two sets of strings

Consider the following:<div id=hotlinklist><a href="foo1.com">Foo1</a><div id=hotlink><a href="/">Home</a></div><div id=hotlink><a…

Kivy TextInput horizontal and vertical align (centering text)

How to center a text horizontally in a TextInput in Kivy?I have the following screen:But I want to centralize my text like this:And this is part of my kv language:BoxLayout: orientation: verticalLabe…

How to capture python SSL(HTTPS) connection through fiddler2

Im trying to capture python SSL(HTTPS) connections through Fiddler2 local proxy. But I only got an error.codeimport requests requests.get("https://www.python.org", proxies={"http": …

removing leading 0 from matplotlib tick label formatting

How can I change the ticklabels of numeric decimal data (say between 0 and 1) to be "0", ".1", ".2" rather than "0.0", "0.1", "0.2" in matplo…

How do I check if an iterator is actually an iterator container?

I have a dummy example of an iterator container below (the real one reads a file too large to fit in memory):class DummyIterator:def __init__(self, max_value):self.max_value = max_valuedef __iter__(sel…

Python Terminated Thread Cannot Restart

I have a thread that gets executed when some action occurs. Given the logic of the program, the thread cannot possibly be started while another instance of it is still running. Yet when I call it a sec…