Get a structure of HTML code

2024/11/17 12:54:22

I'm using BeautifulSoup4 and I'm curious whether is there a function which returns a structure (ordered tags) of the HTML code.

Here is an example:

<html>
<body>
<h1>Simple example</h1>
<p>This is a simple example of html page</p>
</body>
</html>

print page.structure():

>>
<html>
<body>
<h1></h1>
<p></p>
</body>
</html>

I tried to find a solution but no success.

Thanks

Answer

There is not, to my knowledge, but a little recursion should work:

def taggify(soup):for tag in soup:if isinstance(tag, bs4.Tag):yield '<{}>{}</{}>'.format(tag.name,''.join(taggify(tag)),tag.name)

demo:

html = '''<html><body><h1>Simple example</h1><p>This is a simple example of html page</p></body></html>'''soup = BeautifulSoup(html)''.join(taggify(soup))
Out[34]: '<html><body><h1></h1><p></p></body></html>'
https://en.xdnf.cn/q/71547.html

Related Q&A

max_help_position is not works in python argparse library

Hi colleagues I have the code (max_help_position is 2000):formatter_class=lambda prog: argparse.HelpFormatter(prog, max_help_position=2000) parser = argparse.ArgumentParser(formatter_class=formatter_cl…

Python - How to parse argv on the command line using stdin/stdout?

Im new to programming. I looked at tutorials for this, but Im just getting more confused. But what Im trying to do is use stdin and stdout to take in data, pass it through arguments and print out outpu…

Bad Request from Yelp API

Inspired by this Yelp tutorial, I created a script to search for all gyms in a given city. I tweaked the script with these updates in order to return ALL gyms, not just the first 20. You can find the g…

Python: test empty set intersection without creation of new set

I often find myself wanting to test the intersection of two sets without using the result of the intersections.set1 = set([1,2]) set2 = set([2,3]) if(set1 & set2):print("Non-empty intersection…

object has no attribute show

I have installed wxpython successfully which i verified by import wxBut when I write a code import wx class gui(wx.Frame):def __init__(self,parent,id):wx.Frame.__init__(self, parent,id,Visualisation fo…

How Does Calling Work In Python? [duplicate]

This question already has answers here:Does Python make a copy of objects on assignment?(5 answers)How do I pass a variable by reference?(40 answers)Why can a function modify some arguments as percei…

Python: Sklearn.linear_model.LinearRegression working weird

I am trying to do multiple variables linear regression. But I find that the sklearn.linear_model working very weird. Heres my code:import numpy as np from sklearn import linear_modelb = np.array([3,5,7…

Implementation of Gaussian Process Regression in Python y(n_samples, n_targets)

I am working on some price data with x = day1, day2, day3,...etc. on day1, I have lets say 15 price points(y), day2, I have 30 price points(y2), and so on.When I read the documentation of Gaussian Proc…

Converting a list of points to an SVG cubic piecewise Bezier curve

I have a list of points and want to connect them as smoothly as possible. I have a function that I evaluate to get these points. I could simply use more sampling points but that would only increase the…

Python Class Inheritance AttributeError - why? how to fix?

Similar questions on SO include: this one and this. Ive also read through all the online documentation I can find, but Im still quite confused. Id be grateful for your help.I want to use the Wand class…