python url extract from html

2024/11/19 22:34:02

I need python regex to extract url's from html, example html code :

<a href=""http://a0c5e.site.it/r"" target=_blank><font color=#808080>MailUp</font></a>
<a href=""http://www.site.it/prodottiLLPP.php?id=1"" class=""txtBlueGeorgia16"">Prodotti</a>
<a href=""http://www.site.it/terremoto.php"" target=""blank"" class=""txtGrigioScuroGeorgia12"">Terremoto</a>
<a class='mini' href='http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse'>clicca qui.</a>`

I need extract only:

 http://a0c5e.site.it/rhttp://www.site.it/prodottiLLPP.php?id=1http://www.site.it/terremoto.phphttp://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse
Answer

Regex might solve your problem, but consider using BeautifulSoup

>>> html = """<a href="http://a0c5e.site.it/r" target=_blank><font color=#808080>MailUp</font></a>
<a href="http://www.site.it/prodottiLLPP.php?id=1" class=""txtBlueGeorgia16"">Prodotti</a>
<a href="http://www.site.it/terremoto.php" target=""blank"" class=""txtGrigioScuroGeorgia12"">Terremoto</a>
<a class='mini' href='http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse'>clicca qui.</a>`"""
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> [e['href'] for e in soup.findAll('a')]
[u'http://a0c5e.site.it/r', u'http://www.site.it/prodottiLLPP.php?id=1', u'http://www.site.it/terremoto.php', u'http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse']

From Jon Clements

soup.findAll('a', {'href': True}) 

On a different note, your href quotaion in your html snippet is incorrect.

https://en.xdnf.cn/q/119900.html

Related Q&A

Regex match each character at least once [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

How to cluster with K-means, when number of clusters and their sizes are known [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 1…

Converting German characters (like , etc) from Mac Roman to UTF (or similar)?

I have a CSV file which I can read in and at all works fine except for the specific German (and possibly other) characters. Ive used chardet to determine that the encoding is Mac Roman import chardetde…

Caesar cipher without knowing the Key

Hey guys if you look at my code below you will be able to see that i was able to create a program that can open a file decode the content of the file and save it into another file but i need to input t…

how to convert u\uf04a to unicode in python [duplicate]

This question already has answers here:Python unicode codepoint to unicode character(4 answers)Closed 2 years ago.I am trying to decode u\uf04a in python thus I can print it without error warnings. In …

How can I display a nxn matrix depending on users input?

For a school task I need to display a nxn matrix depending on users input: heres an example: 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0(users input: 5) And here is my code until now:n = int(inpu…

How to launch 100 workers in multiprocessing?

I am trying to use python to call my function, my_function() 100 times. Since my_function takes a while to run, I want to parallelize this process. I tried reading the docs for https://docs.python.org/…

Indexes of a list Python

I am trying to find how to print the indexes of words in a list in Python. If the sentence is "Hello world world hello name") I want it to print the list "1, 2, 2, 1, 3")I removed a…

str object is not callable - CAUTION: DO NO USE SPECIAL FUNCTIONS AS VARIABLES

EDIT: If you define a predefined type such as: str = 5 then theoriginal functionality of that predefined will change to a new one. Lesson Learnt: Do not give variables names that are predefined or bel…

Using `wb.save` results in UnboundLocalError: local variable rel referenced before assignment

I am trying to learn how to place an image in an Excel worksheet but I am having a problem with wb.save. My program ends with the following error:"C:\Users\Don\PycharmProjects\Test 2\venv\Scripts\…