I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true:
from Bio.Seq import Seq
from Bio.Alphabet.IUPAC import IUPACAmbiguousDNAamb = IUPACAmbiguousDNA()
s1 = Seq("GGAAAAGG", amb)
s2 = Seq("ARAA", amb) # R = A or G
print s1.find(s2)
If ambiguity codes were taken into account, the answer should be
>>> 2
But the answer i get is that no match is found, or
>>> -1
Looking at the biopython source code, it doesnt appear that ambiguity codes are taken into account, as the subseqeunce is converted to a string using the private _get_seq_str_and_check_alphabet method, then the built in string method find() is used. Of course if this is the case, the "R" ambiguity code will be taken as a literal "R", not an A or G.
I could figure out how to do this with a home made method, but it seems like something that should be taken care of in the biopython packages using its Seq objects. Is there something I am missing here.
Is there a way to search for sub sequence membership accounting for ambiguity codes?