Beautiful soup: Extract everything between two tags
I have seen a question through the above link where we are getting the information between two tags. Whereas I need to get the information between the tags when these tags are having two different id attribute values.
<h1 id = 'beautiful' ></h1>Text <i>here</i> has no tag<div>This is in a div</div><h1 id = 'good' ></h1>
I am using BeautifulSoup to extract data from HTML files. I want to get all of the information between the two tags. This means that if I have an HTML section like this:
<h1></h1>Text <i>here</i> has no tag<div>This is in a div</div><h1></h1>
Then if I wanted all of the information between the first h1 and the second h1, the output would look like this:
Text <i>here</i> has no tag<div>This is in a div</div>
from bs4 import BeautifulSouphtml_doc = '''
This I <b>don't</b> want
<h1></h1>
Text <i>here</i> has no tag
<div>This is in a div</div>
<h1></h1>
This I <b>don't</b> want too
'''soup = BeautifulSoup(html_doc, 'html.parser')for c in list(soup.contents):if c is soup.h1 or c.find_previous('h1') is soup.h1:continuec.extract()for h1 in soup.select('h1'):h1.extract()print(soup)
Prints:
Text <i>here</i> has no tag
<div>This is in a div</div>
This is working without ids.
Could someone help me in this regard?