python - beautifulsoup - TypeError: sequence item 0: expected string, Tag found

2024/11/15 11:55:21

I'm using beautifulsoup to extract images and links from a html string. It all works perfectly fine, however with some links that have a tag in the link contents it is throwing an error.

Example Link:

<a href="http://www.example.com"><strong>Link Text</strong></a>

Python Code:

soup = BeautifulSoup(contents)
links = soup.findAll('a')
for link in links:print link.contents # generates errorprint str(link.contents) # outputs [Link Text]

Error Message:

TypeError: sequence item 0: expected string, Tag found

I don't really want to have to loop through any child tags in the link text, I simply want to return the raw contents, is this possible with BS?

Answer

To grab just the text content of a tag, the element.get_text() method lets you grab (stripped) text from the current element including tags:

print link.get_text(' ', strip=True)

The first argument is used to join all text elements, and sitting strip to True means all text elements are first stripped of leading and trailing whitespace. This gives you neat processed text in most cases.

You can also use the .stripped_strings iterable:

print u' '.join(link.stripped_strings)

which is essentially the same effect, but you could choose to process or filter the stripped strings first.

To get the contents, use str() or unicode() on each child item:

print u''.join(unicode(item) for item in link)

which will work for both Element and NavigableString items contained.

https://en.xdnf.cn/q/72444.html

Related Q&A

Python evdev detect device unplugged

Im using the great "evdev" library to listen to a USB barcode reader input and I need to detect if the device suddenly gets unplugged/unresponsive because otherwise the python script reading …

python: urllib2 using different network interface

I have the following code:f = urllib2.urlopen(url) data = f.read() f.close()Its running on a machine with two network interfaces. Id like to specify which interface I want the code to use. Specifically…

RuntimeError: as_numpy_iterator() is not supported while tracing functions

while i was using function as_numpy_iterator() got error--------------------------------------------------------------------------- RuntimeError Traceback (most recent call…

Pandas assert_frame_equal error

Im building test cases and I want to compare 2 dataframes. Even though dataframe have the same columns and values assert_frame_equal reports are not equal. Column order is different, I tried reordering…

Multiple lines on line plot/time series with matplotlib

How do I plot multiple traces represented by a categorical variable on matplotlib or plot.ly on Python? I am trying to replicate the geom_line(aes(x=Date,y=Value,color=Group) function from R.Is there …

Python ABCs: registering vs. subclassing

(I am using python 2.7) The python documentation indicates that you can pass a mapping to the dict builtin and it will copy that mapping into the new dict:http://docs.python.org/library/stdtypes.html#…

python - ensure script is activated only once

Im writing a Python 2.7 script. In summary, this script is being run every night on Linux and activates several processes.Id like to ensure this script is not run multiple times in parallel (basically …

How to set up auto-deploy to AppEngine when pushing to Git Repository

Ive heard that other platforms support auto-deployment of their code to production when they push changes to their Git repository.Can I set up something similar to this for AppEngine? How?Im using Py…

#include zbar.h 1 error generated when running pip install zbar

Im trying to run pip install zbar and for some reason I cant seem to find an answer to solve this dependency issue. Any help would be extremely appreciated. See traceback below:Downloading/unpacking zb…

Django model field default based on another model field

I use Django Admin to build a management site. There are two tables, one is ModelA with data in it, another is ModelB with nothing in it. If one model field b_b in ModelB is None, it can be displayed o…