What is the return type of the find_all method in Beautiful Soup?

2024/5/20 13:00:19
from bs4 import BeautifulSoup, SoupStrainer 
from urllib.request import urlopen
import pandas as pd 
import numpy as np 
import re
import csv
import ssl
import json
from googlesearch import search
from queue import Queue
import re links = []
menu = []
filtered_menu = []def contains(substring, string):if substring.lower() in string.lower():return Trueelse:return Falsefor website in search("mr puffs", tld="com", num=1, stop=1, country="canada", pause=4): links.append(website)soup = BeautifulSoup(urlopen(links.pop(0)), features="html.parser")
menu = soup.find_all('a', href=True)for string in menu:if contains("contact", string):filtered_menu.append(string)print(filtered_menu)

I am creating a webscraper that will extract contact information from sites. However, in order to do that, I need to get to the contact page of the website. Using the googlesearch library, the code searches for a keyword and puts all the results (up to a certain limit) in a list. For simplicity, in this code, we are just putting in the first link. Now, from this link, I am creating a beautiful soup object and I am extracting all the other links on the website(because the contact information is usually not found on the homepage). I am putting these links in a list called menu.

Now, I want to filter menu for only links that have "contact" in it. Example: "www.smallBusiness.com/our-services" would be deleted from the new list while "www.smallBusiness.com/contact" or "www.smallBusiness.com/contact-us" will stay in the list.

I defined a method that checks if a substring is in a string. However, I get the following exception:

TypeError: 'NoneType' object is not callable.

I've tried using regex by doing re.search but it says that the expected type of string or byte-like value is not in the parameters.

I think it's because the return type of find_all is not a string. It's probably something else which I can't find in the docs. If so, how do I convert it into a string?

As requested in the answer below, here's what printing menu list gives:

From here, I just want to extract the highlighted links:

here is the image

Answer

BeautifulSoup.find_all() type is bs4.element.ResultSet (which is actually a list)

Individual items of find_all(), in your case the variable you call "string" are of type bs4.element.Tag.

As your contains function expects type str, your for loop should look something like:

for string in menu:if contains("contact", str(string)):filtered_menu.append(string)
https://en.xdnf.cn/q/73106.html

Related Q&A

All addresses to go to a single page (catch-all route to a single view) in Python Pyramid

I am trying to alter the Pyramid hello world example so that any request to the Pyramid server serves the same page. i.e. all routes point to the same view. This is what iv got so far: from wsgiref.sim…

Python singleton / object instantiation

Im learning Python and ive been trying to implement a Singleton-type class as a test. The code i have is as follows:_Singleton__instance = Noneclass Singleton:def __init__(self):global __instanceif __i…

Single-Byte XOR Cipher (python)

This is for a modern cryptography class that I am currently taking.The challenge is the cryptopals challenge 3: Single-Byte XOR Cipher, and I am trying to use python 3 to help complete this.I know that…

Basemap Heat error / empty map

I am trying to plot a scattered heat map on a defined geo location. I can very well plot a normal scattered map with no background but I want to combine it with a given lat and lon. I get the following…

Keras custom loss function per tensor group

I am writing a custom loss function that requires calculating ratios of predicted values per group. As a simplified example, here is what my Data and model code looks like: def main():df = pd.DataFrame…

How does numpy.linalg.inv calculate the inverse of an orthogonal matrix?

Im implementing a LinearTransformation class, which inherits from numpy.matrix and uses numpy.matrix.I to calculate the inverse of the transformation matrix.Does anyone know whether numpy checks for or…

pandas: Using color in a scatter plot

I have a pandas dataframe:-------------------------------------- | field_0 | field_1 | field_2 | -------------------------------------- | 0 | 1.5 | 2.9 | -------------------…

Framing Errors in Celery 3.0.1

I recently upgraded to Celery 3.0.1 from 2.3.0 and all the tasks run fine. Unfortunately. Im getting a "Framing Error" exception pretty frequently. Im also running supervisor to restart the t…

decorator() got an unexpected keyword argument

I have this error on Django view:TypeError at /web/host/1/ decorator() got an unexpected keyword argument host_id Request Method: GET Request URL: http://127.0.0.1:8000/web/host/1/edit Django Versio…

Conflict between sys.stdin and input() - EOFError: EOF when reading a line

I cant get the following script to work without throwing an EOFError exception:#!/usr/bin/env python3import json import sys# usage: # echo [{"testname": "testval"}] | python3 test.p…