Beautiful soup missing some html table tags

2024/10/13 19:16:55

I'm trying to extract data from a website using beautiful soup to parse the html. I'm currently trying to get the table data from the following webpage :

link to webpage

I want to get the data from the table. First I save the page as an html file on my computer (this part works fine, I checked that I got all the information) but when I try to parse with the following code :

soup = BeautifulSoup(fh, 'html.parser')
table = soup.find_all('table') 
cols = table[0].find_all('tr')
cells = cols[1].find_all('td')`

I don't get any results (specifically it crashes, saying there's no element at index 1). Any idea of where it could come from?

Thanks

Answer

Ok actually it was an issue in the html file, in the first line the html tags were opened with th but closed with td. I don't know much about HTML but replacing the th by td solved the issue.

<tr class="listeEtablenTete">
<th title="Rubrique IC">Rubri. IC</td>
<th title="Alin&eacute;a">Ali.&nbsp;</td>
<th title="Date d'autorisation">Date auto.</td>
<th >Etat d'activit&eacute;</td>
<th title="R&eacute;gime">R&eacute;g.</td>
<th >Activit&eacute;</td>
<th >Volume</td>
<th >Unit&eacute;</td>`

Thanks !

https://en.xdnf.cn/q/118042.html

Related Q&A

403 error Not Authorized to access this resource/api Google Admin SDK in web app even being admin

Im struggling to find the problem since two days without any idea why I get this error now even though the app was fully functional one month before.Among the tasks done by the web app, it makes an Adm…

Kivy - My ScrollView doesnt scroll

Im having problems in my Python application with Kivy library. In particular Im trying to create a scrollable list of elements in a TabbedPanelItem, but I dont know why my list doesnt scroll.Here is my…

How to get an associated model via a custom admin action in Django?

Part 2 of this question asked and answered separately.I have a Report and a ReportTemplate. +----+----------+---------------+-------------+ | id | title | data | template_id | +----+-------…

How can I use descriptors for non-static methods?

I am aware that I can use descriptors to change static property as if it were a normal property. However, when I try using descriptors for a normal class property, I end up changing the object it refer…

psycopg2 not all arguments converted during string formatting

I am trying to use psycopg2 to insert a row into a table from a python list, but having trouble with the string formatting.The table has 4 columns of types (1043-varchar, 1114-timestamp, 1043-varchar, …

inherited function odoo python

i want to inherit function in module hr_holidays that calculate remaining leaves the function is :hr_holiday.py:def _get_remaining_days(self, cr, uid, ids, name, args, context=None):cr.execute("&…

ValueError in pipeline - featureHasher not working?

I think Im having issues getting my vectorizer working within a gridsearch pipeline:data as panda df x_train:bathrooms bedrooms price building_id manager_id 10 1.5 3 3000 53a5b119b…

pandas dataframe: meaning of .index

I am trying to understand the meaning of the output of the following code:import pandas as pdindex = [index1,index2,index3] columns = [col1,col2,col3] df = pd.DataFrame([[1,2,3],[1,2,3],[1,2,3]], index…

Extract text inside XML tags with in Python (while avoiding p tags)

Im working with the NYT corpus in Python and attempting to extract only whats located inside "full_text" class of every .xml article file. For example: <body.content><block class=&qu…

Python (Flask) and MQTT listening

Im currently trying to get my Python (Flask) webserver to display what my MQTT script is doing. The MQTT script, In essence, its subscribed to a topic and I would really like to categorize the info it …