Question 1

I'm writing a script to SELECT query a database and parse through ~33,000 records. Unfortunately I'm running into problems at the cursor.fetchone()/cursor.fetchall() phase of things.

I first tried iterating through the cursor a record at a time like so:

# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
while True:# Get the next row in the cursorrow = cursor.fetchone()if row == None:break# TODO: Determine if there's any kanji in row[2]weight = float((row[3] + row[4]))/2printStatus("Weight: " + str(weight))

Based on the output of printStatus (it prints out a timestamp plus whatever string is passed to it), the script took approximately 1 second to process each row. This lead me to believe that the query was being re-run each time the loop iterated (with a LIMIT 1 or something), as it took ~1 second for the same query to run once in something like SQLiteStudio [i]and[/i] return all 33,000 rows. I calculated that, at that rate, it would take around 7 hours to get through all 33,000 records.

Instead of sitting through that, I tried to use cursor.fetchall() instead:

results = cursor.fetchall()# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
for row in results:# TODO: Determine if there's any kanji in row[2]weight = float((row[3] + row[4]))/2printStatus("Weight: " + str(weight))

Unfortunately, the Python executable locked up at 25% CPU and ~6MB of RAM when it got to the cursor.fetchall() line. I left the script running for ~10 minutes, but nothing happened.

Is ~33,000 returned rows (about 5MB of data) too much for Python to grab at once? Am I stuck iterating through one at a time? Or is there something I can do to speed things up?

EDIT: Here's some console output

12:56:26.019: Adding new column 'weight' and related index to r_ele
12:56:26.019: Querying database
12:56:28.079: Starting weight calculations
12:56:28.079: Weight: 1.0
12:56:28.079: Weight: 0.5
12:56:28.080: Weight: 0.5
12:56:28.338: Weight: 1.0
12:56:28.339: Weight: 3.0
12:56:28.843: Weight: 1.5
12:56:28.844: Weight: 1.0
12:56:28.844: Weight: 0.5
12:56:28.844: Weight: 0.5
12:56:28.845: Weight: 0.5
12:56:29.351: Weight: 0.5
12:56:29.855: Weight: 0.5
12:56:29.856: Weight: 1.0
12:56:30.371: Weight: 0.5
12:56:30.885: Weight: 0.5
12:56:31.146: Weight: 0.5
12:56:31.650: Weight: 1.0
12:56:32.432: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.952: Weight: 1.0
12:56:33.454: Weight: 0.5
12:56:33.455: Weight: 0.5
12:56:33.455: Weight: 1.0
12:56:33.716: Weight: 0.5
12:56:33.716: Weight: 1.0

And here's the SQL query:

//...snip (it wasn't the culprit)...

The output of EXPLAIN QUERY PLAN from SQLiteStudio:

0   0   0   SCAN TABLE r_ele AS re USING COVERING INDEX r_ele_fk (~500000 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 1
1   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 2
2   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
2   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 3
3   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
3   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 4
4   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 5
5   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 6
6   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 7
7   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
7   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 8
8   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
8   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 9
9   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)

Question 2

SQLite computes result records on the fly. fetchone is slow because it has to execute all subqueries for each record in r_ele. fetchall is even slower because it takes just as long as if you had executed fetchone for all records.

SQLite 3.7.13 estimates that all the lookups on the value column would be horribly slow, and therefore creates a temporary index for this query. You should create a permanent index so that it can be used by SQLite 3.6.21:

CREATE INDEX idx_k_ele_value ON k_ele(value);

If that does not help, update to a Python with a newer SQLite version, or use another database library with a newer SQLite version built-in, such as APSW.

Python slow on fetchone, hangs on fetchall

Related Q&A

Pure Python Quadtree Implementation

AttributeError: tuple object has no attribute write

How to catch all exceptions with CherryPy?

matplotlib: How can you specify colour levels in a 2D historgram

Strange behavior from HTTP authentication with suds SOAP library

Scipy/Numpy/scikits - calculating precision/recall scores based on two arrays

Cant create test client during unit test of Flask app

My python installation is broken/corrupted. How do I fix it?

Function that returns a tuple gives TypeError: NoneType object is not iterable

Error when plotting DataFrame containing NaN with Pandas 0.12.0 and Matplotlib 1.3.1 on Python 3.3.2