I have been doing some research into web scraping and noticed it seems to be done mainly using Python, is there any benefit of using a Python based solutions over PHP, are there performance issues and so forth?
I have been doing some research into web scraping and noticed it seems to be done mainly using Python, is there any benefit of using a Python based solutions over PHP, are there performance issues and so forth?
In my opinion, I would go with python, because of its excellent string handling capabilities compared to PHP. Also there are a lot of cool libraries that python has , that make Scraping web pages a bliss.
Some libraries you should check out are :
Beautiful soup
Scrappy
I have personally used BeautifulSoup and its simple and really powerful.
Checkout this piece of code from their documentation :
import urllib2
from BeautifulSoup import BeautifulSouppage = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):where, linebreak, what = incident.contents[:3]print where.strip()print what.strip()print