How to access top five Google result links using Beautifulsoup

2024/9/25 12:30:42

I want to access the top five(or any specified number) of links of results from Google. Through research, I found and modified the following code.

import requests
from bs4 import BeautifulSoup
import re    
search = raw_input("Search:")
page = requests.get("https://www.google.com/search?q=" + search)
soup = BeautifulSoup(page.content, "lxml")
links = soup.find("a")
print links.get('href')

This returns the first link on the page, which seems to be the Google images tab every time.

This is not completely what I want. For starters, I don't want the links of any google sites, just the results. Also, I want the first three or five or any specified number of results.

How can I use python to do this?

Thanks ahead of time!

Answer

You can use:

import requests
from bs4 import BeautifulSoup
import re
search = input("Search:")
results = 100 # valid options 10, 20, 30, 40, 50, and 100
page = requests.get(f"https://www.google.com/search?q={search}&num={results}")
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a")
for link in links :link_href = link.get('href')if "url?q=" in link_href and not "webcache" in link_href:print (link.get('href').split("?q=")[1].split("&sa=U")[0])

Google Search Demo

For duckduckgo.com use:

import requests
from bs4 import BeautifulSoup
import re
search = input("Search:")
h = {"Host":"duckduckgo.com", "Origin": "https://duckduckgo.com", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
d = {"q":search}
page = requests.post(f"https://duckduckgo.com/html/", data=d, headers=h)
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a", {"class": "result__a"})
for link in links :link_href = link.get('href')if not "https://duckduckgo.com" in link_href:print(link_href)
https://en.xdnf.cn/q/71580.html

Related Q&A

Logging in a Framework

Imagine there is a framework which provides a method called logutils.set_up() which sets up the logging according to some config.Setting up the logging should be done as early as possible since warning…

Working of the Earth Mover Loss method in Keras and input arguments data types

I have found a code for the Earth Mover Loss in Keras/Tensrflow. I want to compute the loss for the scores given to images but I can not do it until I get to know the working of the Earth Mover Loss gi…

Django Rest Framework writable nested serializer with multiple nested objects

Im trying to create a writable nested serializer. My parent model is Game and the nested models are Measurements. I am trying to post this data to my DRF application using AJAX. However, when try to po…

Django How to Serialize from ManyToManyField and List All

Im developing a mobile application backend with Django 1.9.1 I implemented the follower model and now I want to list all of the followers of a user but Im currently stuck to do that. I also use Django…

PyDrive and Google Drive - automate verification process?

Im trying to use PyDrive to upload files to Google Drive using a local Python script which I want to automate so it can run every day via a cron job. Ive stored the client OAuth ID and secret for the G…

Using rm * (wildcard) in envoy: No such file or directory

Im using Python and Envoy. I need to delete all files in a directory. Apart from some files, the directory is empty. In a terminal this would be:rm /tmp/my_silly_directory/*Common sense dictates that i…

cant import django model into celery task

i have the following task:from __future__ import absolute_importfrom myproject.celery import appfrom myapp.models import Entity@app.task def add(entity_id):entity = Entity.objects.get(pk=entity_id)retu…

Running unit tests with Nose inside a Python environment such as Autodesk Maya?

Id like to start creating unit tests for my Maya scripts. These scripts must be run inside the Maya environment and rely on the maya.cmds module namespace.How can I run Nose tests from inside a runnin…

Python Newline \n not working in jupyter notebooks

Im trying to display the tuples of a postgreSQL table neatly in my Jupyter Notebook, but the newline \n escape character doesnt seem to work here (it works for my python scripts w/ same code outside of…

Dynamically calling functions - Python

I have a list of functions... e.g.def filter_bunnies(pets): ...def filter_turtles(pets): ...def filter_narwhals(pets): ...Is there a way to call these functions by using a string representing their nam…