Question 1

I have too many lines like this:

>ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286
CTCAGACGTCGGGCCGACGCAAGGCCACGCGCGCGAACACACAGGTGCGGCCCCGGGCCA
CACGCACACCGTACAC
>ENSG00000001630|ENST00000003100|CYP51A1|3210|92134365|92134530
TATATCACAGTTTCTTTCTTTTTTTTTTTTTTTTTTTTGAGACAGAGTTTTGCTCTTGTT
GCCCAGGCTGGAGTACAGTGACGCAATCTCGGCTCACTGCAACCTTTGCCTCCCAGGTTC
>ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286
TTAACTATAATCCCACTGCCTATTTTTTTATTTCTAAAAATATCATAAAAAGACACAAAA

the first line(starting with >) is identifier and other lines are sequence and also each identifier has its own sequence. in the mentioned example, ENSG00000100206 is name and ENST00000216024 is isoform. in my file there are some identifier lines with the same name but everything else is different. I would like to get the longest sequence for each name and make a new file. meaning there would be only one repeat of each name (but with the longest sequence). for the above example the results would be like this:

>ENSG00000100206|ENST00000216024|DMC1|2371|38568257;38570043|38568289;38570286
CTCAGACGTCGGGCCGACGCAAGGCCACGCGCGCGAACACACAGGTGCGGCCCCGGGCCA
CACGCACACCGTACAC
>ENSG00000001630|ENST00000003100|CYP51A1|3210|92134365|92134530
TATATCACAGTTTCTTTCTTTTTTTTTTTTTTTTTTTTGAGACAGAGTTTTGCTCTTGTT
GCCCAGGCTGGAGTACAGTGACGCAATCTCGGCTCACTGCAACCTTTGCCTCCCAGGTTC

do you guys know how to do that in python?

Question 2

You can start by using Biopython to get a proper FASTA format parser: http://biopython.org/wiki/SeqIO

Then iterate over the records, and do what you want with them. This will save you not only the time to write a parser, but also will prevent you from doing it completely wrong.

Example from that very page:

from Bio import SeqIO
for record in SeqIO.parse("example.fasta", "fasta"):print(record.id)

Instead of a print, create a dict {record.id: record.length} that you update only if the length is longer.

filteration in txt file in python

Related Q&A

Count the number of times elements in a numpy array consecutively satisfy a condition

Finding the index of the first repeating item in an array in python

how to check str int list and tuple? [closed]

How to take out numbers and add them together in python [closed]

Populate mysql table with random data in python

I want to change a tuple into string without joining it in python. How could I do that? [duplicate]

Convert several YAML files to CSV

Python split unicode characters and words

How to run two Flask discord.py bots in the same repl.it project?

build a perfect maze recursively in python