Python module BeautifulSoup extracting anchors href

2024/10/13 9:14:58

i am using BeautifulSoup module to select all href from html by this way:

def extract_links(html):soup = BeautifulSoup(html)anchors = soup.findAll('a')print anchorslinks = []for a in anchors:links.append(a['href'])return links

but sometime it failed by this error message:

Traceback (most recent call last):
File "C:\py\main.py", line 33, in <module>
urls = extract_links(page)
File "C:\py\main.py", line 11, in extract_links
links.append(a['href'])
File "C:\py\BeautifulSoup.py", line 601, in __getitem__
return self._getAttrMap()[key]
KeyError: 'href'
Answer

Not all anchor tags will have an href attribute. You should check that the anchor has an href before you try to access that attribute.

if a.has_key('href')links.append(a['href'])

After checking some comments here, I think this is the most pythonic way of handling this case.

https://en.xdnf.cn/q/69549.html

Related Q&A

Pandas: how to get a particular group after groupby? [duplicate]

This question already has answers here:How to access subdataframes of pandas groupby by key(6 answers)Closed 9 years ago.I want to group a dataframe by a column, called A, and inspect a particular grou…

aws cli in cygwin - how to clean up differences in windows and cygwin style paths

I suspect this is my ineptitude in getting path variables set right, but Im at a loss.Ive installed the aws cli using pip in cygwin.pip install awscliI have two python environments... a windows anacon…

Print all variables and their values [duplicate]

This question already has answers here:too many values to unpack, iterating over a dict. key=>string, value=>list(8 answers)Closed 6 years ago.This question has been asked quite a bit, and Ive tr…

How to emulate multiprocessing.Pool.map() in AWS Lambda?

Python on AWS Lambda does not support multiprocessing.Pool.map(), as documented in this other question. Please note that the other question was asking why it doesnt work. This question is different, Im…

Tkinter overrideredirect no longer receiving event bindings

I have a tinter Toplevel window that I want to come up without a frame or a titlebar and slightly transparent, and then solid when the mouse moves over the window. To do this I am using both Toplevel.…

Reusing Tensorflow session in multiple threads causes crash

Background: I have some complex reinforcement learning algorithm that I want to run in multiple threads. ProblemWhen trying to call sess.run in a thread I get the following error message:RuntimeError: …

Conditional column arithmetic in pandas dataframe

I have a pandas dataframe with the following structure:import numpy as np import pandas as pd myData = pd.DataFrame({x: [1.2,2.4,5.3,2.3,4.1], y: [6.7,7.5,8.1,5.3,8.3], condition:[1,1,np.nan,np.nan,1],…

Need some assistance with Python threading/queue

import threading import Queue import urllib2 import timeclass ThreadURL(threading.Thread):def __init__(self, queue):threading.Thread.__init__(self)self.queue = queuedef run(self):while True:host = self…

Python redirect (with delay)

So I have this python page running on flask. It works fine until I want to have a redirect. @app.route("/last_visit") def check_last_watered():templateData = template(text = water.get_last_wa…

Python Selenium. How to use driver.set_page_load_timeout() properly?

from selenium import webdriverdriver = webdriver.Chrome() driver.set_page_load_timeout(7)def urlOpen(url):try:driver.get(url)print driver.current_urlexcept:returnThen I have URL lists and call above me…