I'm writing a python script that allow to convert a html doc into a reveal.js slideshow. To do this, I need to wrap multiple tags inside a <section>
tag.
It's easy to wrap a single tag inside another one using the wrap()
method. However I can't figure out how I can wrap multiple tags.
An example for clarification, the original html:
html_doc = """
<html><head><title>The Dormouse's story</title>
</head><body><h1 id="first-paragraph">First paragraph</h1><p>Some text...</p><p>Another text...</p><div><a href="http://link.com">Here's a link</a></div><h1 id="second-paragraph">Second paragraph</h1><p>Some text...</p><p>Another text...</p><script src="lib/.js"></script>
</body></html>
""""""
I'd like to wrap the <h1>
and their next tags inside <section>
tags, like this:
<html>
<head><title>The Dormouse's story</title>
</head>
<body><section><h1 id="first-paragraph">First paragraph</h1><p>Some text...</p><p>Another text...</p><div><a href="http://link.com">Here's a link</a></div></section><section><h1 id="second-paragraph">Second paragraph</h1><p>Some text...</p><p>Another text...</p></section><script src="lib/.js"></script>
</body></html>
Here's how I made the selection:
from bs4 import BeautifulSoup
import itertools
soup = BeautifulSoup(html_doc)
h1s = soup.find_all('h1')
for el in h1s:els = [i for i in itertools.takewhile(lambda x: x.name not in [el.name, 'script'], el.next_elements)]els.insert(0, el)print(els)
Output:
[<h1 id="first-paragraph">First paragraph</h1>, 'First paragraph', '\n ', <p>Some text...</p>, 'Some text...', '\n ', <p>Another text...</p>, 'Another text...', '\n ', <div><a href="http://link.com">Here's a link</a> </div>, '\n ', <a href="http://link.com">Here's a link</a>, "Here's a link", '\n ', '\n\n '][<h1 id="second-paragraph">Second paragraph</h1>, 'Second paragraph', '\n ', <p>Some text...</p>, 'Some text...', '\n ', <p>Another text...</p>, 'Another text...', '\n\n ']
The selection is correct but I can't see how to wrap each selection inside a <section>
tag.