Consider the following:
<div id=hotlinklist><a href="foo1.com">Foo1</a><div id=hotlink><a href="/">Home</a></div><div id=hotlink><a href="/extract">Extract</a></div><div id=hotlink><a href="/sitemap">Sitemap</a></div>
</div>
How would you go about taking out the sitemap line with regex in python?
<a href="/sitemap">Sitemap</a>
The following can be used to pull out the anchor tags.
'/<a(.*?)a>/i'
However, there are multiple anchor tags. Also there are multiple hotlink(s) so we can't really use them either?