Python verify url goes to a page

2024/9/30 3:22:04

I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and return a list of ones which do not return a site?

Answer

this is kind of slow but you can use something like this to check if url is a live

import urllib2try:urllib2.urlopen(url)return True         # URL Exist
except ValueError, ex:return False        # URL not well formatted
except urllib2.URLError, ex:return False        # URL don't seem to be alive

more quick than urllib2 you can use httplib

import httplibtry:a = httplib.HTTPConnection('google.com')a.connect()
except httplib.HTTPException as ex:print "not connected"

you can also do a DNS checkout (it's not very convenient to check if a website don't exist):

import sockettry:socket.gethostbyname('www.google.com')
except socket.gaierror as ex:print "not existe"
https://en.xdnf.cn/q/71127.html

Related Q&A

Bokeh: Synchronizing hover tooltips in linked plots

I have two linked plots. When hovering, I would like to have a tooltip appear in both plots. I already use the linked selection with great success, but now I want to link the tooltips also.Below is an …

Pipe STDIN to a script that is itself being piped to the Python interpreter?

I need to implement an SVN pre-commit hook which executes a script that itself is stored in SVN.I can use the svn cat command to pipe that script to the Python interpreter, as follows:svn cat file://$R…

subprocess.call using cygwin instead of cmd on Windows

Im programming on Windows 7 and in one of my Python projects I need to call bedtools, which only works with Cygwin on Windows. Im new to Cygwin, installed the default version + everything needed for be…

Django Celery Received unregistered task of type appname.tasks.add

Following the documentation and the Demo Django project here https://github.com/celery/celery/tree/3.1/examples/djangoProject Structurepiesup2|piesup2| |__init__.py| |celery.py| |settings.py| |urls…

Documenting and detailing a single script based on the comments inside

I am going to write a set of scripts, each independent from the others but with some similarities. The structure will most likely be the same for all the scripts and probably looks like: # -*- coding: …

Using Ansible variables in testinfra

Using TestInfra with Ansible backend for testing purposes. Everything goes fine except using Ansible itself while running teststest.pyimport pytest def test_zabbix_agent_package(host):package = host.pa…

How to create a dictionary of dictionaries of dictionaries in Python

So I am taking a natural language processing class and I need to create a trigram language model to generate random text that looks "realistic" to a certain degree based off of some sample da…

How to separate Master Slave (DB read / writes) in Flask Sqlalchemy

Im trying to separate the Read and write DB operations via Flask Sqlalchemy. Im using binds to connect to the mysql databases. I would want to perform the write operation in Master and Reads from slave…

Why import class from another file will call __init__ function?

The structure of the project is:project - main.py - session.py - spider.pyThere is a class in session.py:import requestsclass Session:def __init__(self):self.session = requests.Session()print(Session c…

Flask: login session times out too soon

While editing a record, if there is a long wait of let say a few minutes (getting coffee) and then coming back to press the save (POST), I get redirected to the main page to login instead and the data …