Question 1

from bs4 import BeautifulSoup, SoupStrainer 
from urllib.request import urlopen
import pandas as pd 
import numpy as np 
import re
import csv
import ssl
import json
from googlesearch import search
from queue import Queue
import re links = []
menu = []
filtered_menu = []def contains(substring, string):if substring.lower() in string.lower():return Trueelse:return Falsefor website in search("mr puffs", tld="com", num=1, stop=1, country="canada", pause=4): links.append(website)soup = BeautifulSoup(urlopen(links.pop(0)), features="html.parser")
menu = soup.find_all('a', href=True)for string in menu:if contains("contact", string):filtered_menu.append(string)print(filtered_menu)

I am creating a webscraper that will extract contact information from sites. However, in order to do that, I need to get to the contact page of the website. Using the googlesearch library, the code searches for a keyword and puts all the results (up to a certain limit) in a list. For simplicity, in this code, we are just putting in the first link. Now, from this link, I am creating a beautiful soup object and I am extracting all the other links on the website(because the contact information is usually not found on the homepage). I am putting these links in a list called menu.

Now, I want to filter menu for only links that have "contact" in it. Example: "www.smallBusiness.com/our-services" would be deleted from the new list while "www.smallBusiness.com/contact" or "www.smallBusiness.com/contact-us" will stay in the list.

I defined a method that checks if a substring is in a string. However, I get the following exception:

TypeError: 'NoneType' object is not callable.

I've tried using regex by doing re.search but it says that the expected type of string or byte-like value is not in the parameters.

I think it's because the return type of find_all is not a string. It's probably something else which I can't find in the docs. If so, how do I convert it into a string?

As requested in the answer below, here's what printing menu list gives:

From here, I just want to extract the highlighted links:

here is the image

Question 2

BeautifulSoup.find_all() type is bs4.element.ResultSet (which is actually a list)

Individual items of find_all(), in your case the variable you call "string" are of type bs4.element.Tag.

As your contains function expects type str, your for loop should look something like:

for string in menu:if contains("contact", str(string)):filtered_menu.append(string)

What is the return type of the find_all method in Beautiful Soup?

Related Q&A

All addresses to go to a single page (catch-all route to a single view) in Python Pyramid

Python singleton / object instantiation

Single-Byte XOR Cipher (python)

Basemap Heat error / empty map

Keras custom loss function per tensor group

How does numpy.linalg.inv calculate the inverse of an orthogonal matrix?

pandas: Using color in a scatter plot

Framing Errors in Celery 3.0.1

decorator() got an unexpected keyword argument

Conflict between sys.stdin and input() - EOFError: EOF when reading a line