I have a list of domains and I need to filter the domains served by a CDN(Content Delivery Network). I am going to use python script to do that.
At the first I was thinking I can identify them from the domain name. But not all of the domain names have cdn keyword.
Is there any reason or any feature in the CDN served domains which I use that for identifying CDN served domains?
First of all, you can't do it with 100% accuracy.
But you can identify domains using popular cloud providers in many cases by tracking CNAME records which would lead the respective provider's servers.
I.e. here's a doc on Amazon CloudFront http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/CNAMEs.html
In CloudFront, an alternate domain name, also known as a CNAME, letsyou use your own domain name (for example, www.example.com) for linksto your objects instead of using the domain name that CloudFrontassigns to your distribution
Example:
dig -t CNAME c.amazon-adsystem.com
c.amazon-adsystem.com. 896 IN CNAME d1ykf07e75w7ss.cloudfront.net.