python: find html tags and replace their attributes [duplicate]
2024/11/13 7:53:27
I need to do the following:
take html document
find every occurrence of 'img' tag
take their 'src' attribute
pass founded url to processing
change the 'src' attribute to the new one
do all this stuff with Python 2.7
P.S. I,ve heard about lmxl and BeautifulSoup. How do you recommend to solve this problem with? Maybe it would be better to use regexes then? or another something else?
Answer
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html_string)
for link in soup.findAll('a')link['src'] = 'New src'
html_string = str(soup)
I don't particularly like BeautifulSoup but it does the job for you. Try to not over-do your solution if you don't have to, this being one of the simpler things you can do to solve a general issue.
That said, building for the future is equally important but all your 6 requirements can be put down into one, "I want to change 'src' or all links to X"
I have a virtualenv at /opt/webapps/ff/ with its own Python installation. I have WSGIPythonHome set to /opt/webapps/ff in my Apache config file (and this is definitely getting used in some capacity, b…
I wrote a simple native GUI script with python-gtk. Now I want to give the user a button to send an email with an attachment.The script runs on Linux desktops. Is there a way to open the users preferr…
I have a 200k lines list of number ranges like start_position,stop position.
The list includes all kinds of overlaps in addition to nonoverlapping ones.the list looks like this[3,5]
[10,30]
[15,25]
[5…
I have a Tornado Websocket Server and I want to time out after 30 minutes of inactivity. I use self.close() to close the connection after 30 minutes of inactivity. But it seems that some connections st…
Im writing a script that will take as user inputed string, and print it vertically, like so:input = "John walked to the store"output = J w t t so a o h th l e on k re edIve written …
Im trying to remove gradient background noise from the images I have. Ive tried many ways with cv2 without success.Converting the image to grayscale at first to make it lose some gradients that may hel…
I have been playing around with subprocess lately. As I do more and more; I find myself needing root access. I was wondering if there is an easy way to enter the root password for a command that needs …
Im trying to plot a time series data, where for certain periods there is no data. Data is loaded into dataframe and Im plotting it using df.plot(). The problem is that the missing periods get connected…
Im trying to set up a Python sandbox and want to forbid access to standard and file I/O. I am running the sandbox inside of a running Python server.Ive already looked at modules like RestrictedPython a…
I have dataset which has more than 50k nodes and I am trying to extract possible edges and communities from them. I did try using some graph tools like gephi, cytoscape, socnet, nodexl and so on to v…