Use BeautifulSoup to extract sibling nodes between two nodes

2024/10/2 10:34:40

I've got a document like this:

<p class="top">I don't want this</p><p>I want this</p>
<table><!-- ... -->
</table><img ... /><p> and all that stuff too</p><p class="end>But not this and nothing after it</p>

I want to extract everything between the p[class=top] and p[class=end] paragraphs.

Is there a nice way I can do this with BeautifulSoup?

Answer

node.nextSibling attribute is your solution:

from BeautifulSoup import BeautifulSoupsoup = BeautifulSoup(html)nextNode = soup.find('p', {'class': 'top'})
while True:# processnextNode = nextNode.nextSiblingif getattr(nextNode, 'name', None)  == 'p' and nextNode.get('class', None) == 'end':break

This complicated condition is to be sure that you're accessing attributes of HTML tag and not string nodes.

https://en.xdnf.cn/q/70863.html

Related Q&A

Put HTML into ValidationError in Django

I want to put an anchor tag into this ValidationError:Customer.objects.get(email=value)if self.register:# this address is already registeredraise forms.ValidationError(_(An account already exists for t…

python os.listdir doesnt show all files

In my windows7 64bit system, there is a file named msconfig.exe in folder c:/windows/system32. Yes, it must exists.But when i use os.listdir to search the folder c:/windows/system32, I didnt get the fi…

how to save modified ELF by pyelftools

Recently Ive been interested in ELF File Structure. Searching on web, I found an awesome script named pyelftools. But in fact I didnt know the way to save the modified ELF; ELFFile class doesnt have an…

Access train and evaluation error in xgboost

I started using python xgboost backage. Is there a way to get training and validation errors at each training epoch? I cant find one in the documentation Have trained a simple model and got output:[09…

Gtk* backend requires pygtk to be installed

From within a virtual environment, trying to load a script which uses matplotlibs GTKAgg backend, I fail with the following traceback:Traceback (most recent call last):File "<stdin>", l…

ValueError: A value in x_new is below the interpolation range

This is a scikit-learn error that I get when I domy_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)Note that if I decrease max_n_alphas from 1e5 down to 1…

Parsing Python function calls to get argument positions

I want code that can analyze a function call like this:whatever(foo, baz(), puppet, 24+2, meow=3, *meowargs, **meowargs)And return the positions of each and every argument, in this case foo, baz(), pup…

Is there a proper way to subclass Tensorflows Dataset?

I was looking at different ways that one can do custom Tensorflow datasets, and I was used to looking at PyTorchs datasets, but when I went to look at Tensorflows datasets, I saw this example: class Ar…

Install pyserial Mac OS 10.10?

Attempting to communicate with Arduino serial ports using Python 2.7. Have downloaded pyserial 2.7 (unzipped and put folder pyserial folder in python application folder). Didnt work error message. &quo…

Binning frequency distribution in Python

I have data in the two lists value and freq like this:value freq 1 2 2 1 3 3 6 2 7 3 8 3 ....and I want the output to be bin freq 1-3 6 4-6 2 7-9 6 ...I can write fe…