Question 1

I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually about 4-5) connections. I'm doing single pair shortest path calculations and have implemented the following code to do a breadth-first search.

from Queue import Queueq = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'# path finding
q.put(fromNode)
parent[fromNode] = 'Root'while not q.empty():# get the next node and add its neighbours to queuecurrent = q.get()for i in getNeighbours(current):# note parent and only continue if not already visitedif i[0] not in parent:parent[i[0]] = currentq.put(i[0])# check if destinationif current == toNode:print 'arrived at', toNodebreak

The above code uses the Python 2.6 Queue module and getNeighbours() is simply a subroutine that makes a single MySQL call and returns the neighbours as a list of tuples e.g. (('foo',),('bar',)). The SQL call is quick.

The code works ok however testing to down to depths of about 7 layers takes about 20 seconds to run (2.5GHz Intel 4GB RAM OS X 10.6)

I'd welcome any comments about how to improve the run time of this code.

Question 2

Well, given the upvotes on the comment, I'll make it an answer now.

The SQL in the tight loop is definitely slowing you down. I don't care how fast the call is. Think about it -- you're asking for a query to be parsed, a lookup to be run -- as fast as that is, it's still in a tight loop. What does your data set look like? Can you just SELECT the entire data set into memory, or at least work with it outside of MySQL?

If you work with that data in memory, you will see a significant performance gain.

Can this breadth-first search be made faster?

Related Q&A

How to remove rows of a DataFrame based off of data from another DataFrame?

Amazon S3 Python S3Boto 403 Forbidden When Signature Has + sign

List comparison of element

Partition pyspark dataframe based on the change in column value

Error group argument must be None for now in multiprocessing.pool

Making the diamond square fractal algorithm infinite

How do I generate coverage xml report for a single package?

Asynchronous URLfetch when we dont care about the result? [Python]

Python: How to fill out form all at once with splinter/Browser?

Dump elementtree into xml file