How to convert BeautifulSoup.ResultSet to string

2024/10/6 16:17:56

So I parsed a html page with .findAll (BeautifulSoup) to variable named result. If I type result in Python shell then press Enter, I see normal text as expected, but as I wanted to postprocess this result as string object, I noticed that str(result) returns garbage, like this sample:

\xd1\x87\xd0\xb8\xd0\xbb\xd0\xbd\xd0\xb8\xd1\x86\xd0\xb0</a><br />\n<hr />\n</div>

Html page source is utf-8 encoded

How can I handle this?


Code is basically this, in case it matters:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
result = soup.findAll(something)

Python is 2.7

Answer

Python 2.6.7 BeautifulSoup.version 3.2.0

This worked for me:

unicode.join(u'\n',map(unicode,result))

I'm pretty sure a result is a BeautifulSoup.ResultSet object, which seems to be an extension of the standard python list

https://en.xdnf.cn/q/70342.html

Related Q&A

How can I manually place networkx nodes using the mouse?

I have a fairly large and messy network of nodes that I wish to display as neatly as possible. This is how its currently being displayed:First, I tried playing with the layout to see if it could genera…

How to create a vertical scroll bar with Plotly?

I would like to create a vertical scroll for a line chart in Plotly. For visualisation, the vertical scroll is something depicted in the figure below.Assume, we have 6 line chart as below, then how ca…

Django 1.7 makemigrations renaming tables to None

I had to move a few models from one app to another, and I followed the instructions on this answer https://stackoverflow.com/a/26472482/188614. Basically I used the CreateModel migrations generated by …

TypeError on CORS for flask-restful

While trying the new CORS feature on flask-restful, I found out that the decorator can be only applied if the function returns a string. For example, modifying the Quickstart example:class HelloWorld(r…

struct.error: unpack requires a string argument of length 16

While processing a PDF file (2.pdf) with pdfminer (pdf2txt.py) I received the following error:pdf2txt.py 2.pdf Traceback (most recent call last):File "/usr/local/bin/pdf2txt.py", line 115, in…

SELECT EXISTS vs. LIMIT 1

I see SELECT EXISTS used a lot like:if db.query("""SELECT EXISTS (SELECT 1 FROM checkoutWHERE checkout_id = %s)""" % checkout_id).getresult()[0][0] == t:vs. what i prefer:…

How to access a specific start_url in a Scrapy CrawlSpider?

Im using Scrapy, in particular Scrapys CrawlSpider class to scrape web links which contain certain keywords. I have a pretty long start_urls list which gets its entries from a SQLite database which is …

Escaping search queries for Googles full text search service

This is a cross-post of https://groups.google.com/d/topic/google-appengine/97LY3Yfd_14/discussionIm working with the new full text search service in gae 1.6.6 and Im having trouble figuring out how to …

dificulty solving a code in O(logn)

I wrote a function that gets as an input a list of unique ints in order,(from small to big). Im supposed to find in the list an index that matches the value in the index. for example if L[2]==2 the out…

Scrapy. How to change spider settings after start crawling?

I cant change spider settings in parse method. But it is definitely must be a way. For example:class SomeSpider(BaseSpider):name = mySpiderallowed_domains = [example.com]start_urls = [http://example.co…