I'm trying to extract lists/sublists from one bigger integer-list with Python2.7 by using start- and end-patterns. I would like to do it with a function, but I cant find a library, algorithm or a regular expression for solving this problem.
def myFunctionForSublists(data, startSequence, endSequence):# ... tododata = [99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99]startSequence = [1,2,3]
endSequence = [4,5,6]sublists = myFunctionForSublists(data, startSequence, endSequence)print sublists[0] # [1, 2, 3, 99, 99, 99, 4, 5, 6]
print sublists[1] # [1, 2, 3, 99, 4, 5, 6]
Any ideas how I can realize it?
Here's a more general solution that doesn't require the lists being sliceable, so you can use it on other iterables, like generators.
We keep a deque
the size of the start
sequence until we come across it. Then we add those values to a list, and keep iterating over the sequence. As we do, we keep a deque
the size of the end sequence, until we see it, also adding the elements to the list we're keeping. If we come across the end sequence, we yield
that list and set the deque
up to scan for the next start sequence.
from collections import dequedef gen(l, start, stop):start_deque = deque(start)end_deque = deque(stop)curr_deque = deque(maxlen=len(start))it = iter(l)for c in it:curr_deque.append(c)if curr_deque == start_deque:potential = list(curr_deque)curr_deque = deque(maxlen=len(stop))for c in it:potential.append(c)curr_deque.append(c)if curr_deque == end_deque:yield potentialcurr_deque = deque(maxlen=len(start))breakprint(list(gen([99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99], [1,2,3], [4,5,6])))# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]