Question 1

I am scraping election data from a website and trying to store it in a dataframe

import pandas as pd
import bs4
import requestscolumns = ['Candidate','Party','Criminal Cases','Education','Age','Total Assets','Liabilities']df = pd.DataFrame(columns = columns)ind=1url = requests.get("http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341")
soup = bs4.BeautifulSoup(url.content)for content in soup.findAll("td")[16:]:df.iloc[ind//7,ind%7-1] = content.textind=ind+1
print(df)

Essentially, each iteration of content.text will provide me a value which I will populate in the table. The loop will populate values to df in the following sequence -

df[0,0]
df[0,1]
df[0,2]
.
.
.
df[1,0]
df[1,1]
.
.

and so on. Unfortunately the iloc is throwing an error - "single positional indexer is out-of-bounds". The funny part is when I try df.iloc[0,0] = content.text outside the for loop (in a separate cell for testing purpose), the code works properly, but in the for loop it creates an error. I believe it might be something trivial but i am unable to understand.Please help

Question 2

DataFrame.iloc cannot enlarge its target object. This used to be the error message, but has changed since version 0.15.

In general a DataFrame is not meant to be built row at a time. It is very inefficient. Instead you should create a more traditional data structure and populate a DataFrame from it:

table = soup.find(id='table1')
rows = table.find_all('tr')[1:]
data = [[cell.text for cell in row.find_all('td')] for row in rows]
df = pd.DataFrame(data=data, columns=columns)

From inspecting the page in your request it seems you were after the table with the id "table1", which has as the first row the header (a poor choice from the authors of that page, should've been in <thead>, not the body). So skip the first row ([1:]) and then build a list of lists from the cells of the rows.

Of course you could also just let pandas worry about parsing and all:

url = "http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341"
df = pd.read_html(url, header=0)[2]  # Pick the 3rd table in the page

Python 3.x - iloc throws error - single positional indexer is out-of-bounds

Related Q&A

Supposed automatically threaded scipy and numpy functions arent making use of multiple cores

Golang net.Listen binds to port thats already in use

Aiohttp, Asyncio: RuntimeError: Event loop is closed

Python for ios interpreter [duplicate]

Detect when multiprocessing queue is empty and closed

imshow and histogram2d: cant get them to work

3D Waterfall Plot with Colored Heights

python xlsxwriter: Keep header in excel when adding a table

Unexpected tokens in !DOCTYPE html in PyCharm Community Edition

Indexing a numpy array with a numpy array of indexes [duplicate]