Scraping web pages with Python vs PHP? [closed]

2024/10/5 15:06:12

I have been doing some research into web scraping and noticed it seems to be done mainly using Python, is there any benefit of using a Python based solutions over PHP, are there performance issues and so forth?

Answer

In my opinion, I would go with python, because of its excellent string handling capabilities compared to PHP. Also there are a lot of cool libraries that python has , that make Scraping web pages a bliss.

Some libraries you should check out are :

Beautiful soup

Scrappy

I have personally used BeautifulSoup and its simple and really powerful.

Checkout this piece of code from their documentation :

import urllib2
from BeautifulSoup import BeautifulSouppage = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):where, linebreak, what = incident.contents[:3]print where.strip()print what.strip()print
https://en.xdnf.cn/q/120261.html

Related Q&A

error switching to iframe selenium python

Currently Im attempting to switch to iframe/fancybox, but im getting the following error:line 237, in check_response raise exception_class (message, screen, stacktrace) selenium.common.exceptions.WebDr…

SQL Date Variables in Python

I am writing a query inside a Python script.The query is as follows:cur = conn.cursor() query1 = """select max(date_time) from tablename""" cur.execute(queryy1) conn.commi…

Truble to create a matrix for sudoku in python

I have to make a sudoku template, so I need to get random numbers into the matrix, but they can not be repeated in the rows or columns, but I can not do it. For this I can not use numpy. In this case I…

Password validation in python with regex without duplicate chars

The parameter is a string. Check whether it forms a secure password.A password is safe ifthere is at least a lowercase letter in it, and there is at least one capital letter in it, and there is at leas…

Processing non-english text

I have a python file that reads a file given by the user, processes it, and ask questions in flash card format. The program works fine with an english txt file but I encounter errors when trying to pro…

Downloading all zip files from url

I need to download all the zip files from the url: https://www.ercot.com

sql to query set

I have 2 tables:puzz_meeting_candidats :- id, canceled, candidat_id, meeting_id puzz_meeting :- id, ClientI have a query follow: SELECT U1.`candidat_id` AS Col1 FROM `puzz_meeting_candidats` U1 INN…

Google App Engine, best practice to schedule code execution [closed]

Closed. This question is opinion-based. It is not currently accepting answers.Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.Clo…

delete rows by date and add file name column for multiple csv

I have multiple "," delimited csv files with recorded water pipe pressure sensor data, already sorted by date older-newer. For all original files, the first column always contains dates forma…

X = Y = Lists vs Numbers [duplicate]

This question already has answers here:Immutable vs Mutable types(20 answers)How do I clone a list so that it doesnt change unexpectedly after assignment?(24 answers)Closed 4 years ago.In python : I h…