Python slow on fetchone, hangs on fetchall

2024/9/8 10:34:27

I'm writing a script to SELECT query a database and parse through ~33,000 records. Unfortunately I'm running into problems at the cursor.fetchone()/cursor.fetchall() phase of things.

I first tried iterating through the cursor a record at a time like so:

# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
while True:# Get the next row in the cursorrow = cursor.fetchone()if row == None:break# TODO: Determine if there's any kanji in row[2]weight = float((row[3] + row[4]))/2printStatus("Weight: " + str(weight))

Based on the output of printStatus (it prints out a timestamp plus whatever string is passed to it), the script took approximately 1 second to process each row. This lead me to believe that the query was being re-run each time the loop iterated (with a LIMIT 1 or something), as it took ~1 second for the same query to run once in something like SQLiteStudio [i]and[/i] return all 33,000 rows. I calculated that, at that rate, it would take around 7 hours to get through all 33,000 records.

Instead of sitting through that, I tried to use cursor.fetchall() instead:

results = cursor.fetchall()# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
for row in results:# TODO: Determine if there's any kanji in row[2]weight = float((row[3] + row[4]))/2printStatus("Weight: " + str(weight))

Unfortunately, the Python executable locked up at 25% CPU and ~6MB of RAM when it got to the cursor.fetchall() line. I left the script running for ~10 minutes, but nothing happened.

Is ~33,000 returned rows (about 5MB of data) too much for Python to grab at once? Am I stuck iterating through one at a time? Or is there something I can do to speed things up?

EDIT: Here's some console output

12:56:26.019: Adding new column 'weight' and related index to r_ele
12:56:26.019: Querying database
12:56:28.079: Starting weight calculations
12:56:28.079: Weight: 1.0
12:56:28.079: Weight: 0.5
12:56:28.080: Weight: 0.5
12:56:28.338: Weight: 1.0
12:56:28.339: Weight: 3.0
12:56:28.843: Weight: 1.5
12:56:28.844: Weight: 1.0
12:56:28.844: Weight: 0.5
12:56:28.844: Weight: 0.5
12:56:28.845: Weight: 0.5
12:56:29.351: Weight: 0.5
12:56:29.855: Weight: 0.5
12:56:29.856: Weight: 1.0
12:56:30.371: Weight: 0.5
12:56:30.885: Weight: 0.5
12:56:31.146: Weight: 0.5
12:56:31.650: Weight: 1.0
12:56:32.432: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.952: Weight: 1.0
12:56:33.454: Weight: 0.5
12:56:33.455: Weight: 0.5
12:56:33.455: Weight: 1.0
12:56:33.716: Weight: 0.5
12:56:33.716: Weight: 1.0

And here's the SQL query:

//...snip (it wasn't the culprit)...

The output of EXPLAIN QUERY PLAN from SQLiteStudio:

0   0   0   SCAN TABLE r_ele AS re USING COVERING INDEX r_ele_fk (~500000 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 1
1   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 2
2   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
2   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 3
3   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
3   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 4
4   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 5
5   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 6
6   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 7
7   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
7   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 8
8   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
8   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 9
9   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
Answer

SQLite computes result records on the fly. fetchone is slow because it has to execute all subqueries for each record in r_ele. fetchall is even slower because it takes just as long as if you had executed fetchone for all records.

SQLite 3.7.13 estimates that all the lookups on the value column would be horribly slow, and therefore creates a temporary index for this query. You should create a permanent index so that it can be used by SQLite 3.6.21:

CREATE INDEX idx_k_ele_value ON k_ele(value);

If that does not help, update to a Python with a newer SQLite version, or use another database library with a newer SQLite version built-in, such as APSW.

https://en.xdnf.cn/q/73017.html

Related Q&A

Pure Python Quadtree Implementation

All,There are a few examples on implementing a quadtree using Python but my question is, does anyone know of a class written in pure python as in a single .py file that I can easily include in my proje…

AttributeError: tuple object has no attribute write

I have a homework assignment for a Python class and am running into an error that I dont understand. Running Python IDLE v3.2.2 on Windows 7.Below is where the problem is happening:#local variables num…

How to catch all exceptions with CherryPy?

I use CherryPy to run a very simple web server. It is intended to process the GET parameters and, if they are correct, do something with them. import cherrypyclass MainServer(object):def index(self, **…

matplotlib: How can you specify colour levels in a 2D historgram

I would like to plot a 2D histogram that includes both positive and negative numbers. I have the following code which uses pcolormesh but I am unable to specify the color levels to force the white col…

Strange behavior from HTTP authentication with suds SOAP library

I have a working python program that is fetching a large volume of data via SOAP using suds. The web service is implemented with a paging function such that I can grab nnn rows with each fetch call an…

Scipy/Numpy/scikits - calculating precision/recall scores based on two arrays

I fit a Logistic Regression Model and train the model based on training dataset using the following import scikits as sklearn from sklearn.linear_model import LogisticRegression lr = LogisticRegression…

Cant create test client during unit test of Flask app

I am trying to find out how to run a test on a function which grabs a variable value from session[user_id]. This is the specific test method:def test_myProfile_page(self):with app.test_client() as c:w…

My python installation is broken/corrupted. How do I fix it?

I followed these instructions on my RedHat Linux version 7 server (which originally just had Python 2.6.x installed):beginning of instructions install build toolssudo yum install make automake gcc gcc-…

Function that returns a tuple gives TypeError: NoneType object is not iterable

What does this error mean? Im trying to make a function that returns a tuple. Im sure im doing all wrong. Any help is appreciated.from random import randint A = randint(1,3) B = randint(1,3) def make_…

Error when plotting DataFrame containing NaN with Pandas 0.12.0 and Matplotlib 1.3.1 on Python 3.3.2

First of all, this question is not the same as this one.The problem Im having is that when I try to plot a DataFrame which contains a numpy NaN in one cell, I get an error:C:\>\Python33x86\python.ex…