I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and return a list of ones which do not return a site?
I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and return a list of ones which do not return a site?
this is kind of slow but you can use something like this to check if url is a live
import urllib2try:urllib2.urlopen(url)return True # URL Exist
except ValueError, ex:return False # URL not well formatted
except urllib2.URLError, ex:return False # URL don't seem to be alive
more quick than urllib2 you can use httplib
import httplibtry:a = httplib.HTTPConnection('google.com')a.connect()
except httplib.HTTPException as ex:print "not connected"
you can also do a DNS checkout (it's not very convenient to check if a website don't exist):
import sockettry:socket.gethostbyname('www.google.com')
except socket.gaierror as ex:print "not existe"