Python 3.x - iloc throws error - single positional indexer is out-of-bounds

2024/10/14 6:17:31

I am scraping election data from a website and trying to store it in a dataframe

import pandas as pd
import bs4
import requestscolumns = ['Candidate','Party','Criminal Cases','Education','Age','Total Assets','Liabilities']df = pd.DataFrame(columns = columns)ind=1url = requests.get("http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341")
soup = bs4.BeautifulSoup(url.content)for content in soup.findAll("td")[16:]:df.iloc[ind//7,ind%7-1] = content.textind=ind+1
print(df)

Essentially, each iteration of content.text will provide me a value which I will populate in the table. The loop will populate values to df in the following sequence -

df[0,0]
df[0,1]
df[0,2]
.
.
.
df[1,0]
df[1,1]
.
.

and so on. Unfortunately the iloc is throwing an error - "single positional indexer is out-of-bounds". The funny part is when I try df.iloc[0,0] = content.text outside the for loop (in a separate cell for testing purpose), the code works properly, but in the for loop it creates an error. I believe it might be something trivial but i am unable to understand.Please help

Answer

DataFrame.iloc cannot enlarge its target object. This used to be the error message, but has changed since version 0.15.

In general a DataFrame is not meant to be built row at a time. It is very inefficient. Instead you should create a more traditional data structure and populate a DataFrame from it:

table = soup.find(id='table1')
rows = table.find_all('tr')[1:]
data = [[cell.text for cell in row.find_all('td')] for row in rows]
df = pd.DataFrame(data=data, columns=columns)

From inspecting the page in your request it seems you were after the table with the id "table1", which has as the first row the header (a poor choice from the authors of that page, should've been in <thead>, not the body). So skip the first row ([1:]) and then build a list of lists from the cells of the rows.

Of course you could also just let pandas worry about parsing and all:

url = "http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341"
df = pd.read_html(url, header=0)[2]  # Pick the 3rd table in the page
https://en.xdnf.cn/q/69444.html

Related Q&A

Supposed automatically threaded scipy and numpy functions arent making use of multiple cores

I am running Mac OS X 10.6.8 and am using the Enthought Python Distribution. I want for numpy functions to take advantage of both my cores. I am having a problem similar to that of this post: multithre…

Golang net.Listen binds to port thats already in use

Port 8888 is already bound on my (OS X 10.13.5) system, by a process running inside a docker container:$ netstat -an | grep 8888 tcp6 0 0 ::1.8888 *.* LISTE…

Aiohttp, Asyncio: RuntimeError: Event loop is closed

I have two scripts, scraper.py and db_control.py. In scraper.py I have something like this: ... def scrape(category, field, pages, search, use_proxy, proxy_file):...loop = asyncio.get_event_loop()to_do…

Python for ios interpreter [duplicate]

This question already has answers here:Closed 11 years ago.Possible Duplicate:Python or Ruby Interpreter on iOS I just discovered this apps pypad and python for ios They have like an interpreter an ed…

Detect when multiprocessing queue is empty and closed

Lets say I have two processes: a reader and a writer. How does the writer detect when the reader has finished writing values?The multiprocessing module has a queue with a close method that seems custo…

imshow and histogram2d: cant get them to work

Im learning Python and this is my first question here. Ive read other topics related to the usage of imshow but didnt find anything useful. Sorry for my bad English.I have plotted a set of points here,…

3D Waterfall Plot with Colored Heights

Im trying to visualise a dataset in 3D which consists of a time series (along y) of x-z data, using Python and Matplotlib.Id like to create a plot like the one below (which was made in Python: http://a…

python xlsxwriter: Keep header in excel when adding a table

I have a panda dataframe that I write to a xslx file, and would like to add a table over that data. I would also like to keep the headers that I have already written, instead of adding them again. Is t…

Unexpected tokens in !DOCTYPE html in PyCharm Community Edition

I am new in using PyCharm but I am loving it gradually. I am getting a red underline on <!DOCTYPE html> and the error is "Unexpected Token".Why PyCharm shows it? I cant understand.

Indexing a numpy array with a numpy array of indexes [duplicate]

This question already has answers here:Indexing a numpy array with a list of tuples(2 answers)Index multidimensional array with index array(1 answer)Closed 5 years ago.I have a 3D numpy array data and …