I wrote a piece of code that is supposed to find common intersecting ID's in line[1] in two different files. On my small sample files it works OK, but on my bigger files does not. I cannot figure out why, can you suggest me what is wrong? The exact problem is when my input is i.e. 200 it gives me 90 intersections, if I reduce it to 150, it gives me intersections of 110, logically it cannot be higher.
fileA = open("file1.txt",'r')
fileB = open("file2.txt",'r')
output = open("result.txt",'w')
#fileA.next()dictA = dict()
for line1 in fileA:listA = line1.split('\t')dictA[listA[1]] = listAdictB = dict()
for line1 in fileB:listB = line1.split('\t')dictB[listB[1]] = listBfor key in set(dictA).intersection(dictB):output.write(dictB[key][0]+'\t'+dictA[key][1]+'\t'+dictA[key][4]+'\t'+dictA[key][5]+'\t'+dictA[key][9]+'\t'+dictA[key][10]+'\n')
My file1 is sorted by line[0] and has 0-15 lines, to make it simpler here I give an example putting only line[0] and line[1],
contig17 GRMZM2G052619_P03 x x x x x x x x x x x x x x
contig33 AT2G41790.1 x x x x x x x x x x x x x x
contig98 GRMZM5G888620_P01 x x x x x x x x x x x x x x
contig102 GRMZM5G886789_P02 x x x x x x x x x x x x x x
contig123 AT3G57470.1 x x x x x x x x x x x x x x
My file2 is not sorted and has 0-10 line, I give only line[1]
y GRMZM2G052619_P03 y y y y y y y y
y GRMZM5G888620_P01 y y y y y y y y
y GRMZM5G886789_P02 y y y y y y y y
My desired output,
contig17 GRMZM2G052619_P03 y y y y
contig98 GRMZM5G888620_P01 y y y y
contig102 GRMZM5G886789_P02 y y y y