Can Biopython perform Seq.find() accounting for ambiguity codes

2024/9/20 20:42:26

I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true:

from Bio.Seq import Seq
from Bio.Alphabet.IUPAC import IUPACAmbiguousDNAamb = IUPACAmbiguousDNA()
s1 = Seq("GGAAAAGG", amb)
s2 = Seq("ARAA", amb)     # R = A or G
print s1.find(s2)

If ambiguity codes were taken into account, the answer should be

>>> 2

But the answer i get is that no match is found, or

>>> -1

Looking at the biopython source code, it doesnt appear that ambiguity codes are taken into account, as the subseqeunce is converted to a string using the private _get_seq_str_and_check_alphabet method, then the built in string method find() is used. Of course if this is the case, the "R" ambiguity code will be taken as a literal "R", not an A or G.

I could figure out how to do this with a home made method, but it seems like something that should be taken care of in the biopython packages using its Seq objects. Is there something I am missing here.

Is there a way to search for sub sequence membership accounting for ambiguity codes?

Answer

From what I can read from the documentation for Seq.find here:

http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#find

It appears that this method works similar to the str.find method in that it looks for exact match. So, while the dna sequence can contain ambiguity codes, the Seq.find() method will only return a match when the exact subsequence matches.

To do what you want maybe the ntsearch function will work:

Search for motifs with degenerate positions

https://en.xdnf.cn/q/72103.html

Related Q&A

MySQL and lock a table, read, and then truncate

I am using mysqldb in python.I need to do the following for a table.1) Lock 2) Read 3) Truncate the table 4) UnlockWhen I run the below code, I get the below error. So, I am rather unsure on how to lo…

Train and predict on variable length sequences

Sensors (of the same type) scattered on my site are manually reporting on irregular intervals to my backend. Between reports the sensors aggregate events and report them as a batch. The following datas…

What should a Python project structure look like for Travis CI to find and run tests?

I currently have a project with the following .travis.yml file:language: python install: "pip install tox" script: "tox"Locally, tox properly executes and runs 35 tests, but on Trav…

Having trouble building a Dns Packet in Python

Im trying to build a dns packet to send over a socket. I dont want to use any libraries because I want direct access to the socket variable that sends it. Whenever I send the DNS packet, wireshark says…

Element wise comparison between 1D and 2D array

Want to perform an element wise comparison between an 1D and 2D array. Each element of the 1D array need to be compared (e.g. greater) against the corresponding row of 2D and a mask will be created. He…

Jquery ajax post request not working

I have a simple form submission with ajax, but it keeps giving me an error. All the error says is "error". No code, no description. No nothing, when I alert it when it fails.Javascript with …

Why is the main() function not defined inside the if __main__?

You can often see this (variation a):def main():do_something()do_sth_else()if __name__ == __main__:main()And I am now wondering why not this (variation b):if __name__ == __main__:do_something()do_sth_e…

Using selenium inside gitlab CI/CD

Ive desperetaly tried to set a pytest pipeline CI/CD for my personal projet hosted by gitlab. I tried to set up a simple project with two basic files: file test_core.py, witout any other dependencies f…

VSCode issue with Python versions and environments from Jupyter Notebooks

Issue: I am having issues with the environment and version of Python not matching the settings in VSCode, and causing issues with the packages I am trying to use in Jupyter notebooks. I am using a Wind…

weighted covariance matrix in numpy

I want to compute the covariance C of n measurements of p quantities, where each individual quantity measurement is given its own weight. That is, my weight array W has the same shape as my quantity ar…