Is it possible to scrape webpage without using third-party libraries in python?

2024/10/6 2:21:24

I am trying to understand how beautiful soup works in python. I used beautiful soup,lxml in my past but now trying to implement one script which can read data from given webpage without any third-party libraries but it looks like xml module don't have much options and throwing many errors. Is there any other library with good documentation for reading data from web page? I am not using these scripts on any particular websites. I am just trying to read from public pages and news blogs.

Answer

Third party libraries exist to make your life easier. Yes, of course you could write a program without them (the authors of the libraries had to). However, why reinvent the wheel?

Your best options are beautifulsoup and scrappy. However, if your having trouble with beautifulsoup, I wouldn't try scrappy.

Perhaps you can get by with just the plain text from the website?

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
pagetxt = soup.get_text()

Then you can be done with all external libraries and just work with plain text. However, if you need to do something more complicated. HTML is something you really should use a library for manipulating. They is just too much that can go wrong.

https://en.xdnf.cn/q/118996.html

Related Q&A

Different model performance evaluations by statsmodels and scikit-learn

I am trying to fit a multivariable linear regression on a dataset to find out how well the model explains the data. My predictors have 120 dimensions and I have 177 samples:X.shape=(177,120), y.shape=(…

Python to search CSV file and return relevant info

I have a csv file with information about some computers on our network. Id like to be able to type from the command line a quick line to bring me back the relevant items from the csv. In the format:$…

Remove all elements matching a predicate from a list in-place

How can I remove all elements matching a predicate from a list IN-PLACE? T = TypeVar("T")def remove_if(a: list[T], predicate: Callable[[T], bool]):# TODO: Fill this in.# Test: a = [1, 2, 3, …

Python-scriptlines required to make upload-files from JSON-Call

Lacking experience & routine for Python-programming. Borrowing examples from forums have made a script which successfully makes a JSON-cal (to Domoticz) and generates & prints the derived JSON-…

python pygame mask collision [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.The com…

How to find max average of values by converting list of tuples to dictionary?

I want to take average for all players with same name. I wrote following code. its showing index error whats the issue?Input: l = [(Kohli, 73), (Ashwin, 33), (Kohli, 7), (Pujara, 122),(Ashwin, 90)]Out…

Cannot pass returned values from function to another function in python

My goal is to have a small program which checks if a customer is approved for a bank loan. It requires the customer to earn > 30k per year and to have atleast 2 years of experience on his/her curren…

What is the difference here that prevents this from working?

Im reading a list of customer names and using each to find an element. Before reading the list, I make can confirm this works when I hard-code the name,datarow = driver.find_element_by_xpath("//sp…

How can I extract numbers based on context of the sentence in python?

I tried using regular expressions but it doesnt do it with any context Examples:: "250 kg Oranges for Sale" "I want to sell 100kg of Onions at 100 per kg"

Chart with secondary y-axis and x-axis as dates

Im trying to create a chart in openpyxl with a secondary y-axis and an DateAxis for the x-values.For this MWE, Ive adapted the secondary axis example with the DateAxis example.from datetime import date…