Question 1

I am trying to understand how beautiful soup works in python. I used beautiful soup,lxml in my past but now trying to implement one script which can read data from given webpage without any third-party libraries but it looks like xml module don't have much options and throwing many errors. Is there any other library with good documentation for reading data from web page? I am not using these scripts on any particular websites. I am just trying to read from public pages and news blogs.

Question 2

Third party libraries exist to make your life easier. Yes, of course you could write a program without them (the authors of the libraries had to). However, why reinvent the wheel?

Your best options are beautifulsoup and scrappy. However, if your having trouble with beautifulsoup, I wouldn't try scrappy.

Perhaps you can get by with just the plain text from the website?

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
pagetxt = soup.get_text()

Then you can be done with all external libraries and just work with plain text. However, if you need to do something more complicated. HTML is something you really should use a library for manipulating. They is just too much that can go wrong.

Is it possible to scrape webpage without using third-party libraries in python?

Related Q&A

Different model performance evaluations by statsmodels and scikit-learn

Python to search CSV file and return relevant info

Remove all elements matching a predicate from a list in-place

Python-scriptlines required to make upload-files from JSON-Call

python pygame mask collision [closed]

How to find max average of values by converting list of tuples to dictionary?

Cannot pass returned values from function to another function in python

What is the difference here that prevents this from working?

How can I extract numbers based on context of the sentence in python?

Chart with secondary y-axis and x-axis as dates