How to save web page as text file?

2024/10/10 11:26:54

I would like to save a web page (all content) as a text file. (As if you did right click on webpage -> "Save Page As" -> "Save as text file" and not as html file)

I have tried using the following code:

import urllib2
url=''
page = urllib2.urlopen(url)
page_content = page.read()
file = open('file_text.txt', 'w')
f.write(page_content)
f.close()

My goal is to be able to save a whole text without html code. (for example i would like read "è" instead "&eacute")

Answer

Have a look at html2text as mentioned elsewhere

import urllib2
import html2text
url=''
page = urllib2.urlopen(url)
html_content = page.read()
rendered_content = html2text.html2text(html_content)
file = open('file_text.txt', 'w')
file.write(rendered_content)
file.close()
https://en.xdnf.cn/q/69901.html

Related Q&A

PIL Image.size returns the opposite width/height

Using PIL to determine width and height of imagesOn a specific image (luckily only this one - but it is troubling) the width/height returning from image.size is the opposite. The image: http://storage.…

Django Rest Framework: Register multiple serializers in ViewSet

Im trying to create a custom API (not using models), but its not showing the request definition in the schema (in consequence, not showing it in swagger). My current code is:views.pyclass InfoViewSet(v…

Which is most accurate way to distinguish one of 8 colors?

Imagine we how some basic colors:RED = Color ((196, 2, 51), "RED") ORANGE = Color ((255, 165, 0), "ORANGE") YELLOW = Color ((255, 205, 0), "YELLOW") GREEN = Color ((0, 128…

Lambda function behavior with and without keyword arguments

I am using lambda functions for GUI programming with tkinter. Recently I got stuck when implementing buttons that open files: self.file="" button = Button(conf_f, text="Tools opt.",…

How to get type annotation within python function scope?

For example: def test():a: intb: strprint(__annotations__) test()This function call raises a NameError: name __annotations__ is not defined error. What I want is to get the type annotation within the f…

General explanation of how epoll works?

Im doing a technical write-up on switching from a database-polling (via synchronous stored procedure call) to a message queue (via pub/sub). Id like to be able to explain how polling a database is vas…

How to return cost, grad as tuple for scipys fmin_cg function

How can I make scipys fmin_cg use one function that returns cost and gradient as a tuple? The problem with having f for cost and fprime for gradient, is that I might have to perform an operation twice…

n-gram name analysis in non-english languages (CJK, etc)

Im working on deduping a database of people. For a first pass, Im following a basic 2-step process to avoid an O(n^2) operation over the whole database, as described in the literature. First, I "b…

How to retrieve all the attributes of LDAP database

I am using ldap module of python to connect to ldap server. I am able to query the database but I dont know how to retrieve the fields present in the database, so that I can notify the user in advance …

Why would this dataset implementation run out of memory?

I follow this instruction and write the following code to create a Dataset for images(COCO2014 training set)from pathlib import Path import tensorflow as tfdef image_dataset(filepath, image_size, batch…