I'm trying to pull out a random set of key-value pairs from a dictionary I made from a csv file. The dictionary contains information for genes, with the gene name being the dictionary key, and a list of numbers (related to gene expression etc.) being the value.
# python 2.7.5
import csv
import randomgenes_csv = csv.reader(open('genes.csv', 'rb'))genes_dict = {}
for row in genes_csv:genes_dict[row[0]] = row[1:]length = raw_input('How many genes do you want? ')for key in genes_dict:random_list = random.sample(genes_dict.items(), int(length))print random_list
The problem is, if I try to get a list of 100 genes (for example), it seems to iterate over the whole dictionary and return every possible combination of 100 genes.
If you want to get random K
elements from dictionary D
you simply use
import random
random.sample( D.items(), K )
and that's all you need.
From the Python's documentation:
random.sample(population, k)
Return a k length list of unique elementschosen from the population sequence. Used for random sampling withoutreplacement.
In your case
import csv
import randomgenes_csv = csv.reader(open('genes.csv', 'rb'))genes_dict = {}
for row in genes_csv:genes_dict[row[0]] = row[1:]length = raw_input('How many genes do you want? ')
random_list = random.sample( genes_dict.items(), int(length) )
print random_list
There is no need to iterate through all the keys of the dictionary
for key in genes_dict:random_list = random.sample(genes_dict.items(), int(length))print random_list
notice, that you are actualy not using the key
variable inside your loop, which should warn you that something may be wrong here. Although it is not true that it " return every possible combination of 100 genes.", it simply returns N
random k
element genes lists (in your case 100), where N
is the size of the dictionary, which is far from being "all combinations" (which is N!/(N-k)!k!
)