Python: Find a Sentence between some website-tags using regex

2024/11/19 16:25:30

I want to find a sentence between the ...class="question-hyperlink"> tags. With this code:

import urllib2
import reresponse = urllib2.urlopen('https://stackoverflow.com/questions/tagged/python')
html = response.read(20000)a = re.search('question-hyperlink', html)
print html[a.end()+3:a.end()+100]

I get:

DF5 for Python: high level vs low level interfaces. h5py</a></h3>        <div class="excerpt">

How can I stop at the next < ? And how do I find the next sentence? I want to do it with regex.

EDIT To the downvoters: I want to do it like he does: RegEx match open tags except XHTML self-contained tags

Answer

If you must do it with regular expressions, try something like this:

a = re.finditer('<a.+?question-hyperlink">(.+?)</a>', html)
for m in a: print m.group(1)

Just for the reference, this code does the same, but in a far more robust way:

doc = BeautifulSoup(html)
for a in doc.findAll('a', 'question-hyperlink'):print a.text
https://en.xdnf.cn/q/119935.html

Related Q&A

How to download all the href (pdf) inside a class with python beautiful soup?

I have around 900 pages and each page contains 10 buttons (each button has pdf). I want to download all the pdfs - the program should browse to all the pages and download the pdfs one by one. Code only…

Reducing the complexity/computation time for a basic graph formula [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.Improve…

Find All Possible Fixed Size String Python

Problem: I want to generate all possible combination from 36 characters that consist of alphabet and numbers in a fixed length string. Assume that the term "fixed length" is the upper bound f…

What is the concept of namespace when importing a function from another module?

main.py:from module1 import some_function x=10 some_function()module1.py:def some_function():print str(x)When I execute the main.py, it gives an error in the moduel1.py indicating that x is not availab…

How to pass a literal value to a kedro node? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.This po…

How to Loop a List and Extract required data (Beautiful Soup)

I need help in looping a list and extracting the src links. This is my list and the code: getimages = getDetails.find_all(img) #deleting the first image in the list getimages[0].decompose() print(getim…

square root without pre-defined function in python

How can one find the square root of a number without using any pre-defined functions in python?I need the main logic of how a square root of a program works. In general math we will do it using HCF bu…

How do I sort a text file by three columns with a specific order to those columns in Python?

How do I sort a text file by three columns with a specific order to those columns in Python?My text_file is in the following format with whitespaces between columns:Team_Name Team_Mascot Team_Color Te…

regular expression to search only one-digit number

Im trying to find sentences having only one digit number along with.sentence="Im 30 years old." print(re.match("[0-9]", sentence)then it returns<re.Match object; span=(0, 1), mat…

Automate adding new column and field names to all csv files in directories [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 3…