Clean up ugly WYSIWYG HTML code? Python or *nix utility

2024/10/5 5:24:36

I'm finally upgrading (rewriting ;) ) my first Django app, but I am migrating all the content.

I foolishly gave users a full WYSIWYG editor for certain tasks, the HTML code produced is of course terribly ugly with more extra tags than content.

Does anyone know of a library or external shell app I could use to clean up the code?

I use tidy sometimes, but as far as I know that doesn't do what I'm asking. I want to simplify all the extra span and other garbage tags. I cleaned the most offensive offending styles with some regex, but I it would take a really long time to do anything more using just regex.

Any ideas?

Answer

You could also take a look at Bleach a white-list based HTML sanitizer. It uses html5lib to do what Kyle posted, but you'll get a lot more control over which elements and attributes are allowed in the final output.

https://en.xdnf.cn/q/70520.html

Related Q&A

Using config files written in Python [closed]

Closed. This question is opinion-based. It is not currently accepting answers.Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.Clo…

Model not defined when using foreign key to second model

Im trying to create several relationships between some models such as User and Country. When I try to syncdb, my console outputs "Name Country is not defined". Here is the code: class User(mo…

Sending a POST request to my RESTful API(Python-Flask), but receiving a GET request

Im trying to send a trigger to a Zapier webhook in the form of a POST request containing JSON. It works fine if I just send the POST request through a local python script.What I want to do is create a …

django deploy to Heroku : Server Error(500)

I am trying to deploy my app to heroku. Deploy was done correctly but I got Server Error(500). When I turned DEBUG true, Sever Error doesnt occurred. So I think there is something wrong with loading st…

Should I preallocate a numpy array?

I have a class and its method. The method repeats many times during execution. This method uses a numpy array as a temporary buffer. I dont need to store values inside the buffer between method calls. …

dup, dup2, tmpfile and stdout in python

This is a follow up question from here.Where I want do go I would like to be able to temporarily redirect the stdout into a temp file, while python still is able to print to stdout. This would involve …

How do I write a Class-Based Django Validator?

Im using Django 1.8.The documentation on writing validators has an example of a function-based validator. It also says the following on using a class:You can also use a class with a __call__() method f…

python synthesize midi with fluidsynth

I cant import fluidsynth. [Maybe theres an better module?]Im trying to synthesize midi from python or pygame. I can send midi events from pygame. Im using mingus, and it seemed pyfluidsynth would be g…

Convenient way to handle deeply nested dictionary in Python

I have a deeply nested dictionary in python thats taking up a lot of room. Is there a way to abbreviate something like this master_dictionary[sub_categories][sub_cat_name][attributes][attribute_name][s…

Running tests against existing database using pytest-django

Does anybody know how to run Django Tests using pytest-django against an existing (e.g. production) database? I know that in general, this is not what unit tests are supposed to do, but in my case, I…