scrape the about page of websites with Python [closed]

2024/11/18 18:30:24

I am looking to scrape some content from some websites for research and I was hoping that using python and web scraping might speed up my process. I have used python and beautiful soup before for one small project to convert an xml from one format to another.

Answer

Depending on how redundant is the structure of the data you want to extract, you could use several tools.

  • If you're looking for extracting data always stored in the same DOM structure, Scrapy could do the job.
  • If the data is sparse and is stored in various places, maybe BeautfulSoup4 or lxml could help you.
  • If the data is generated by some JS code, have a look at Selenium

Here are a couple of resources you might find useful:

  • PyCon 2012 Tutorial about web-scraping: http://pyvideo.org/video/609/web-scraping-reliably-and-efficiently-pull-data/
  • http://isbullsh.it/2012/04/Web-crawling-with-scrapy/ (full disclosure, I wrote that)
  • http://www.packtpub.com/article/web-scraping-with-python
  • http://wwwsearch.sourceforge.net/mechanize/
https://en.xdnf.cn/q/120041.html

Related Q&A

Tkinter entry widget execution [duplicate]

This question already exists:Entry widget in tkinterClosed 2 years ago.So I made a simple program however it doesnt seem to work my code is: e = Entry(root, font = 20,borderwidth=5)e.grid(row=1)def cap…

Python Redefine the variable given as a parameter to the function

Hello I am trying to make theme window with tkinter. There is 5 variable for different widgetss color. I will use color dialog for choosing colors but I dont want to define 5 functions. So I think I ca…

Need a Gui Keypad for a touchscreen that outputs a pin when code is correct

I have a raspberry pi with a touchscreen running raspbian, Im hoping to have a Gui on the touchscreen that had a number keypad that when a correct input is entered a pin will output to a door latch or …

How can i make the python to wait till i complete speaking?

I am writing a program to recognise the speech from a microphone and the code will process accordingly. The code I wrote for this purpose is below.import speech_recognition as sr import webbrowser impo…

Facing AttributeError: list object has no attribute lower

I have posted my sample train data as well as test data along with my code. Im trying to use Naive Bayes algorithm to train the model.But, in the reviews Im getting list of list. So, I think my code is…

why I cannot use max() function in this case? [duplicate]

This question already has answers here:Why do I get "TypeError: int object is not iterable" when trying to sum digits of a number? [duplicate](4 answers)Closed 1 year ago.n,m,k=map(int, inpu…

SQLALchemy and Python - Getting the SQL result

I am using cloudkitty which is rating module in OpenStacks.But here question is regarding the SQLAlchemy and Python.I am new to SQLAlchemy.I need to fetch some details from a table using a API call.So …

ValueError: invalid literal for int() with base 10: python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.Closed 10 years ago.Questions asking for code must demonstrate a minimal understanding of the proble…

python code to connect to sftp server

I found this code to connect to remote sftp server with the help of username ,password and host but i also need to include the port number, can any one let em know how to include the port number in thi…

Python: get the max value with the location above and below than the max

If I have a dataframe like this, index User Value location1 1 1.0 4.5 2 1 1.5 5.23 1 3.0 7.04 1 2.5 7.55 2 1.0 11.56 2 1.…