scrape the about page of websites with Python [closed]
2024/11/18 18:30:24
I am looking to scrape some content from some websites for research and I was hoping that using python and web scraping might speed up my process. I have used python and beautiful soup before for one small project to convert an xml from one format to another.
Answer
Depending on how redundant is the structure of the data you want to extract, you could use several tools.
If you're looking for extracting data always stored in the same DOM structure, Scrapy could do the job.
If the data is sparse and is stored in various places, maybe BeautfulSoup4 or lxml could help you.
If the data is generated by some JS code, have a look at Selenium
Here are a couple of resources you might find useful:
PyCon 2012 Tutorial about web-scraping: http://pyvideo.org/video/609/web-scraping-reliably-and-efficiently-pull-data/
http://isbullsh.it/2012/04/Web-crawling-with-scrapy/ (full disclosure, I wrote that)
This question already exists:Entry widget in tkinterClosed 2 years ago.So I made a simple program however it doesnt seem to work
my code is:
e = Entry(root, font = 20,borderwidth=5)e.grid(row=1)def cap…
Hello I am trying to make theme window with tkinter. There is 5 variable for different widgetss color. I will use color dialog for choosing colors but I dont want to define 5 functions. So I think I ca…
I have a raspberry pi with a touchscreen running raspbian, Im hoping to have a Gui on the touchscreen that had a number keypad that when a correct input is entered a pin will output to a door latch or …
I am writing a program to recognise the speech from a microphone and the code will process accordingly. The code I wrote for this purpose is below.import speech_recognition as sr
import webbrowser
impo…
I have posted my sample train data as well as test data along with my code. Im trying to use Naive Bayes algorithm to train the model.But, in the reviews Im getting list of list. So, I think my code is…
This question already has answers here:Why do I get "TypeError: int object is not iterable" when trying to sum digits of a number? [duplicate](4 answers)Closed 1 year ago.n,m,k=map(int, inpu…
I am using cloudkitty which is rating module in OpenStacks.But here question is regarding the SQLAlchemy and Python.I am new to SQLAlchemy.I need to fetch some details from a table using a API call.So …
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.Closed 10 years ago.Questions asking for code must demonstrate a minimal understanding of the proble…
I found this code to connect to remote sftp server with the help of username ,password and host but i also need to include the port number, can any one let em know how to include the port number in thi…