Question 1

I currently have a little script that downloads a webpage and extracts some data I'm interested in. Nothing fancy.

Currently I'm downloading the page like so:

import commands
command = 'wget --output-document=- --quiet --http-user=USER --http-password=PASSWORD https://www.example.ca/page.aspx'
status, text = commands.getstatusoutput(command)

Although this works perfectly, I thought it'd make sense to remove the dependency on wget. I thought it should be trivial to convert the above to urllib2, but thus far I've had zero success. The Internet is full urllib2 examples, but I haven't found anything that matches my need for simple username and password HTTP authentication with a HTTPS server.

Question 2

this says, it should be straight forward

[as] long as your local Python has SSL support.

If you use just HTTP Basic Authentication, you must set different handler, as described here.

Quoting the example there:

import urllib2theurl = 'http://www.someserver.com/toplevelurl/somepage.htm'
username = 'johnny'
password = 'XXXXXX'
# a great passwordpassman = urllib2.HTTPPasswordMgrWithDefaultRealm()
# this creates a password manager
passman.add_password(None, theurl, username, password)
# because we have put None at the start it will always
# use this username/password combination for  urls
# for which `theurl` is a super-urlauthhandler = urllib2.HTTPBasicAuthHandler(passman)
# create the AuthHandleropener = urllib2.build_opener(authhandler)urllib2.install_opener(opener)
# All calls to urllib2.urlopen will now use our handler
# Make sure not to include the protocol in with the URL, or
# HTTPPasswordMgrWithDefaultRealm will be very confused.
# You must (of course) use it when fetching the page though.pagehandle = urllib2.urlopen(theurl)
# authentication is now handled automatically for us

If you do Digest, you'll have to set some additional headers, but they are the same regardless of SSL usage. Google for python+urllib2+http+digest.

Cheers,

HTTPS log in with urllib2

Related Q&A

Filter values inside Python generator expressions

Python and tfidf algorithm, make it faster?

How to use Python to find all isbn in a text file?

AWS Batch Job Execution Results in Step Function

Simplify Django test set up with mock objects

How to use tf.data.Dataset.padded_batch with a nested shape?

Python, thread and gobject

How to type annotate overrided methods in a subclass?

Import Error: No module named pytz after using easy_install

Python catch exception pandas.errors.ParserError: Error tokenizing data. C error