Question 1

I'm trying to extract data from a website using beautiful soup to parse the html. I'm currently trying to get the table data from the following webpage :

link to webpage

I want to get the data from the table. First I save the page as an html file on my computer (this part works fine, I checked that I got all the information) but when I try to parse with the following code :

soup = BeautifulSoup(fh, 'html.parser')
table = soup.find_all('table') 
cols = table[0].find_all('tr')
cells = cols[1].find_all('td')`

I don't get any results (specifically it crashes, saying there's no element at index 1). Any idea of where it could come from?

Thanks

Question 2

Ok actually it was an issue in the html file, in the first line the html tags were opened with th but closed with td. I don't know much about HTML but replacing the th by td solved the issue.

<tr class="listeEtablenTete">
<th title="Rubrique IC">Rubri. IC</td>
<th title="Alin&eacute;a">Ali.&nbsp;</td>
<th title="Date d'autorisation">Date auto.</td>
<th >Etat d'activit&eacute;</td>
<th title="R&eacute;gime">R&eacute;g.</td>
<th >Activit&eacute;</td>
<th >Volume</td>
<th >Unit&eacute;</td>`

Thanks !

Beautiful soup missing some html table tags

Related Q&A

403 error Not Authorized to access this resource/api Google Admin SDK in web app even being admin

Kivy - My ScrollView doesnt scroll

How to get an associated model via a custom admin action in Django?

How can I use descriptors for non-static methods?

psycopg2 not all arguments converted during string formatting

inherited function odoo python

ValueError in pipeline - featureHasher not working?

pandas dataframe: meaning of .index

Extract text inside XML tags with in Python (while avoiding p tags)

Python (Flask) and MQTT listening