Question 1

I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true:

from Bio.Seq import Seq
from Bio.Alphabet.IUPAC import IUPACAmbiguousDNAamb = IUPACAmbiguousDNA()
s1 = Seq("GGAAAAGG", amb)
s2 = Seq("ARAA", amb)     # R = A or G
print s1.find(s2)

If ambiguity codes were taken into account, the answer should be

>>> 2

But the answer i get is that no match is found, or

>>> -1

Looking at the biopython source code, it doesnt appear that ambiguity codes are taken into account, as the subseqeunce is converted to a string using the private _get_seq_str_and_check_alphabet method, then the built in string method find() is used. Of course if this is the case, the "R" ambiguity code will be taken as a literal "R", not an A or G.

I could figure out how to do this with a home made method, but it seems like something that should be taken care of in the biopython packages using its Seq objects. Is there something I am missing here.

Is there a way to search for sub sequence membership accounting for ambiguity codes?

Question 2

From what I can read from the documentation for Seq.find here:

http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#find

It appears that this method works similar to the str.find method in that it looks for exact match. So, while the dna sequence can contain ambiguity codes, the Seq.find() method will only return a match when the exact subsequence matches.

To do what you want maybe the ntsearch function will work:

Search for motifs with degenerate positions

Can Biopython perform Seq.find() accounting for ambiguity codes

Related Q&A

MySQL and lock a table, read, and then truncate

Train and predict on variable length sequences

What should a Python project structure look like for Travis CI to find and run tests?

Having trouble building a Dns Packet in Python

Element wise comparison between 1D and 2D array

Jquery ajax post request not working

Why is the main() function not defined inside the if main?

Using selenium inside gitlab CI/CD

VSCode issue with Python versions and environments from Jupyter Notebooks

weighted covariance matrix in numpy