I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually about 4-5) connections. I'm doing single pair shortest path calculations and have implemented the following code to do a breadth-first search.
from Queue import Queueq = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'# path finding
q.put(fromNode)
parent[fromNode] = 'Root'while not q.empty():# get the next node and add its neighbours to queuecurrent = q.get()for i in getNeighbours(current):# note parent and only continue if not already visitedif i[0] not in parent:parent[i[0]] = currentq.put(i[0])# check if destinationif current == toNode:print 'arrived at', toNodebreak
The above code uses the Python 2.6 Queue module and getNeighbours() is simply a subroutine that makes a single MySQL call and returns the neighbours as a list of tuples e.g. (('foo',),('bar',)). The SQL call is quick.
The code works ok however testing to down to depths of about 7 layers takes about 20 seconds to run (2.5GHz Intel 4GB RAM OS X 10.6)
I'd welcome any comments about how to improve the run time of this code.