filteration in txt file in python

2024/11/8 6:40:56

I have too many lines like this:

>ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286
CTCAGACGTCGGGCCGACGCAAGGCCACGCGCGCGAACACACAGGTGCGGCCCCGGGCCA
CACGCACACCGTACAC
>ENSG00000001630|ENST00000003100|CYP51A1|3210|92134365|92134530
TATATCACAGTTTCTTTCTTTTTTTTTTTTTTTTTTTTGAGACAGAGTTTTGCTCTTGTT
GCCCAGGCTGGAGTACAGTGACGCAATCTCGGCTCACTGCAACCTTTGCCTCCCAGGTTC
>ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286
TTAACTATAATCCCACTGCCTATTTTTTTATTTCTAAAAATATCATAAAAAGACACAAAA

the first line(starting with >) is identifier and other lines are sequence and also each identifier has its own sequence. in the mentioned example, ENSG00000100206 is name and ENST00000216024 is isoform. in my file there are some identifier lines with the same name but everything else is different. I would like to get the longest sequence for each name and make a new file. meaning there would be only one repeat of each name (but with the longest sequence). for the above example the results would be like this:

>ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286
CTCAGACGTCGGGCCGACGCAAGGCCACGCGCGCGAACACACAGGTGCGGCCCCGGGCCA
CACGCACACCGTACAC
>ENSG00000001630|ENST00000003100|CYP51A1|3210|92134365|92134530
TATATCACAGTTTCTTTCTTTTTTTTTTTTTTTTTTTTGAGACAGAGTTTTGCTCTTGTT
GCCCAGGCTGGAGTACAGTGACGCAATCTCGGCTCACTGCAACCTTTGCCTCCCAGGTTC

do you guys know how to do that in python?

Answer

You can start by using Biopython to get a proper FASTA format parser: http://biopython.org/wiki/SeqIO

Then iterate over the records, and do what you want with them. This will save you not only the time to write a parser, but also will prevent you from doing it completely wrong.

Example from that very page:

from Bio import SeqIO
for record in SeqIO.parse("example.fasta", "fasta"):print(record.id)

Instead of a print, create a dict {record.id: record.length} that you update only if the length is longer.

https://en.xdnf.cn/q/120644.html

Related Q&A

Count the number of times elements in a numpy array consecutively satisfy a condition

I have a numpy array as follows:import numpy as np a = np.array([1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8])and a constant number b=6I am searching for a number c which is defined by t…

Finding the index of the first repeating item in an array in python

I was solving an arrays problem from GFG, where we need to find the index of the first repeating element. The question is as follows: Given an array arr[] of size n, find the first repeating element. T…

how to check str int list and tuple? [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

How to take out numbers and add them together in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.Questions asking for code must demonstrate a minimal understanding of the problem being solved. Incl…

Populate mysql table with random data in python

How can create a mysql table in python and then populate it with random data.I want around 10000 rows and integer values will do work.

I want to change a tuple into string without joining it in python. How could I do that? [duplicate]

This question already has answers here:Convert a list to a string without join(5 answers)Closed 3 years ago.For Example: I want to change t=(a, b, c) to s=a, b, c

Convert several YAML files to CSV

I am very new to Python and have several YAML files that I need to convert into csv. These are notes, comments and emails that came from our CRM (Highrise). I ONLY need the Notes and Comments, not the …

Python split unicode characters and words

Im running a data science project and i need your help.My string is:string = 🎁Testand I expect that output:s1 = 🎁s2 = Test

How to run two Flask discord.py bots in the same repl.it project?

I am using repl.it to run two bots in the same repl and Im using imports as a I saw in other stackoverflow questions. print(This will be the page where all bots will be made.) import os import bot1 imp…

build a perfect maze recursively in python

I have this project to build a perfect maze recursively by using python. I have a MyStack class which creates a stack to track the path that I go through. And a Cell class which represent each square w…