I am trying to learn greedy string tiling in algorithm
I have two lists as follows:
a=['a','b','c','d','e','f']
b=['d','e','a','b','c','f']
i would like to retrieve c=['a','b','c','d','e']
Another example would be
a = ['1','2','3','4','5','6','7','8','9','1','3']
b = ['3','4','5','2','1','7','8','9','1']
c
should be ['3','4','5','7','8','9','1']
Right now I am using the following code which works for the latter example but not the former. Can someone help?
def gct(a, b):if len(a) == 0 or len(b) == 0:return []if a[0] == b[0]:return [a[0]] + gct(a[1:], b[1:])return max(gct(a, b[1:]), gct(a[1:], b), key=len)
I am calling function with
gct( ['a','b','c','d','e','f'], ['d','e','a','b','c','f'] )
which gives ['a', 'b', 'c', 'f']
, when it should be ['a','b','c','d','e']
or ['d','e','a','b','c']
.
P.S The order doesn't matter in the printing of the result. The order is only important while doing the comparisons. The minimum pattern length should be 2
NOTE intersect will not solve my problem
The following codes can solve the problem. A minlength
is added to make sure the substr >= minlength
.
maxsub
is used to find the longest substr and return its index in string a
.
markit.a
is used to mark the char position found before in string a
.
while
loop is used to find all the substr (len>=minlength
)
^
#! /usr/bin/env python
a=['a','b','c','d','e','f']
b=['d','e','a','b','c','f']a = ['1','2','3','4','5','6','7','8','9','1','3']
b = ['3','4','5','2','1','7','8','9','1']def gct(a,b,minlength=2):if len(a) == 0 or len(b) == 0:return []# if py>3.0, nonlocal is betterclass markit:a=[0]minlen=2markit.a=[0]*len(a)markit.minlen=minlength#output char indexout=[]# To find the max length substr (index)# apos is the position of a[0] in origin stringdef maxsub(a,b,apos=0,lennow=0):if (len(a) == 0 or len(b) == 0):return []if (a[0]==b[0] and markit.a[apos]!=1 ):return [apos]+maxsub(a[1:],b[1:],apos+1,lennow=lennow+1)elif (a[0]!=b[0] and lennow>0):return []return max(maxsub(a, b[1:],apos), maxsub(a[1:], b,apos+1), key=len)while True:findmax=maxsub(a,b,0,0)if (len(findmax)<markit.minlen):breakelse:for i in findmax:markit.a[i]=1out+=findmaxreturn [ a[i] for i in out]print gct(a,b)
>> ['a', 'b', 'c', 'd', 'e']
>> ['7', '8', '9', '1', '3', '4', '5']
print gct(a,b,3)
>> ['a', 'b', 'c']
>> ['7', '8', '9', '1', '3', '4', '5']