Finding common IDs (intersection) in two dictionaries

2024/10/14 12:26:46

I wrote a piece of code that is supposed to find common intersecting ID's in line[1] in two different files. On my small sample files it works OK, but on my bigger files does not. I cannot figure out why, can you suggest me what is wrong? The exact problem is when my input is i.e. 200 it gives me 90 intersections, if I reduce it to 150, it gives me intersections of 110, logically it cannot be higher.

fileA = open("file1.txt",'r')
fileB = open("file2.txt",'r')
output = open("result.txt",'w')
#fileA.next()dictA = dict()
for line1 in fileA:listA = line1.split('\t')dictA[listA[1]] = listAdictB = dict()
for line1 in fileB:listB = line1.split('\t')dictB[listB[1]] = listBfor key in set(dictA).intersection(dictB):output.write(dictB[key][0]+'\t'+dictA[key][1]+'\t'+dictA[key][4]+'\t'+dictA[key][5]+'\t'+dictA[key][9]+'\t'+dictA[key][10]+'\n')

My file1 is sorted by line[0] and has 0-15 lines, to make it simpler here I give an example putting only line[0] and line[1],

contig17    GRMZM2G052619_P03  x x x x x x x x x x x x x x
contig33    AT2G41790.1    x x x x x x x x x x x x x x
contig98    GRMZM5G888620_P01  x x x x x x x x x x x x x x  
contig102   GRMZM5G886789_P02  x x x x x x x x x x x x x x  
contig123   AT3G57470.1    x x x x x x x x x x x x x x

My file2 is not sorted and has 0-10 line, I give only line[1]

y GRMZM2G052619_P03 y y y y y y y y         
y GRMZM5G888620_P01 y y y y y y y y     
y GRMZM5G886789_P02 y y y y y y y y     

My desired output,

contig17    GRMZM2G052619_P03  y y y y
contig98    GRMZM5G888620_P01  y y y y  
contig102   GRMZM5G886789_P02  y y y y  
Answer

Pay attention to this:

output.write(dictB[key][0]+'\t'+dictA[key][1]

It means you print file2 first column than file1 second column. It doesn't correspond with your examples and desired output.

As for intersection routine, it looks quite correct, so probably it's something wrong with your file. Are you sure all keys are unique? What do you mean by "reduce to 150" - do you mean just deleting some lines from this very file.

Also better replace

for key in set(dictA).intersection(dictB):

with

for key in dictA:if key in dictB:

It's actually the same, but should be faster and spends less memory.

https://en.xdnf.cn/q/117956.html

Related Q&A

Run command line containing multiple strings from python script

Hello i am trying to autogenerate a PDF, i have made a python script that generates the wanted PDF but to generate it i have to call my_cover.py -s "Atsumi" -t "GE1.5s" -co "Ja…

Identify value across multiple columns in a dataframe that contain string from a list in python

I have a dataframe with multiple columns containing phrases. What I would like to do is identify the column (per row observation) that contains a string that exists within a pre-made list of words. Wi…

ipython like interpreter for ruby

I come from python background and am learning ruby. IPython is really awesome. I am new to ruby now, and wanted to have some sort of ipython things. As of now am having tough time, going along ruby lin…

Django dynamic verification form

Im trying to create a verification form in Django that presents a user with a list of choices, only one of which is valid.For example, for a user whose favourite pizza toppings includes pineapple and r…

Any method to denote object assignment?

Ive been studying magic methods in Python, and have been wondering if theres a way to outline the specific action of:a = MyClass(*params).method()versus:MyClass(*params).method()In the sense that, perh…

Delete lines found in file with many lines

I have an Excel file (.xls) with many lines (1008), and Im looking for lines that have anything with 2010. For example, there is a line that contains 01/06/2010, so this line would be deleted, leaving …

Combining semaphore and time limiting in python-trio with asks http request

Im trying to use Python in an async manner in order to speed up my requests to a server. The server has a slow response time (often several seconds, but also sometimes faster than a second), but works …

Import of SWIG python module fails with apache

Importing a python mdule throws an exception in django when I run with apache. The same source code works fine with the django development server. I can also import the module from the command line. Th…

Pro-Football-Reference Team Stats XPath

I am using the scrapy shell on this page Pittsburgh Steelers at New England Patriots - September 10th, 2015 to pull individual team stats. For example, I want to pull total yards for the away team (46…

How to delete the last item of a collection in mongodb

I made a program with python and mongodb to do some diaries. Like thisSometimes I want to delete the last sentence, just by typing "delete!" But I dont know how to delete in a samrt way. I do…