How to extract links from a page using Beautiful soup

2024/10/2 16:23:33

I have a HTML Page with multiple divs like:

<div class="post-info-wrap"><h2 class="post-title"><a href="https://www.example.com/blog/111/this-is-1st-post/" title="Example of 1st post &#8211; Example 1 Post" rel="bookmark">sample post &#8211; example 1 post</a></h2><div class="post-meta clearfix"><div class="post-info-wrap"><h2 class="post-title"><a href="https://www.example.com/blog/111/this-is-2nd-post/" title="Example of 2nd post &#8211; Example 2 Post" rel="bookmark">sample post &#8211; example 2 post</a></h2><div class="post-meta clearfix">

and I need to get the value for all the divs with class post-info-wrap I am new to BeautifulSoup

so I need these urls:

  • https://www.example.com/blog/111/this-is-1st-post/

  • https://www.example.com/blog/111/this-is-2nd-post/

  • and so on...

I have tried:

import re
import requests
from bs4 import BeautifulSoupr = requests.get("https://www.example.com/blog/author/abc") 
data = r.content  # Content of responsesoup = BeautifulSoup(data, "html.parser")
for link in soup.select('.post-info-wrap'):print link.find('a').attrs['href']

This code doesnt seem to be working. I am new to beautiful soup. How can i extract the links?

Answer

You can use soup.find_all:

from bs4 import BeautifulSoup as soup
r = [i.a['href'] for i in soup(html, 'html.parser').find_all('div', {'class':'post-info-wrap'})]

Output:

['https://www.example.com/blog/111/this-is-1st-post/', 'https://www.example.com/blog/111/this-is-2nd-post/']
https://en.xdnf.cn/q/70833.html

Related Q&A

Verifying the integrity of PyPI Python packages

Recently there came some news about some Malicious Libraries that were uploaded into Python Package Index (PyPI), see:Malicious libraries on PyPI Malicious modules found into official Python repository…

How to get results from custom loss function in Keras?

I want to implement a custom loss function in Python and It should work like this pseudocode:aux = | Real - Prediction | / Prediction errors = [] if aux <= 0.1:errors.append(0) elif aux > 0.1 &am…

How to tell whether a file is executable on Windows in Python?

Im writing grepath utility that finds executables in %PATH% that match a pattern. I need to define whether given filename in the path is executable (emphasis is on command line scripts).Based on "…

Issue with python/pytz Converting from local timezone to UTC then back

I have a requirement to convert a date from a local time stamp to UTC then back to the local time stamp.Strangely, when converting back to the local from UTC python decides it is PDT instead of the or…

Regex to replace %variables%

Ive been yanking clumps of hair out for 30 minutes doing this one...I have a dictionary, like so:{search: replace,foo: bar}And a string like this:Foo bar %foo% % search %.Id like to replace each var…

Python kivy - how to reduce height of TextInput

I am using kivy to make a very simple gui for an application. Nothing complex, very simple layout.Nevertheless I am having a hard time with TextInputs...They always display with full height and I cant …

Python-Matplotlib boxplot. How to show percentiles 0,10,25,50,75,90 and 100?

I would like to plot an EPSgram (see below) using Python and Matplotlib. The boxplot function only plots quartiles (0, 25, 50, 75, 100). So, how can I add two more boxes?

Python Pandas reads_csv skip first x and last y rows

I think I may be missing something obvious here, but I am new to python and pandas. I am reading a large text file and only want to use rows in range(61,75496). I can skip the first 60 rows withkeyword…

Combine two arrays data using inner join

Ive two data sets in array: arr1 = [[2011-10-10, 1, 1],[2007-08-09, 5, 3],... ]arr2 = [[2011-10-10, 3, 4],[2007-09-05, 1, 1],... ]I want to combine them into one array like this: arr3 = [[2011-10-10, 1…

How to change fontsize of individual legend entries in pyplot?

What Im trying to do is control the fontsize of individual entries in a legend in pyplot. That is, I want the first entry to be one size, and the second entry to be another. This was my attempt at a so…