I have a list with 155k
files. When I random.sample(list, 100)
, while the results are not the same from the previous sample, they look similar.
Is there a better alternative to random.sample
that returns a new list of random 100 files?
folders = get_all_folders('/data/gazette-txt-files')
# get all files from all folders
def get_all_files():files = []for folder in folders:files.append(glob.glob("/data/gazette-txt-files/" + folder + "/*.txt"))# convert 2D list into 1Dformatted_list = []for file in files:for f in file:formatted_list.append(f)# 200 random text filesreturn random.sample(formatted_list, 200)
For purposes like randomly selecting elements from a list, using random.sample
suffices, true randomness isn't provided and I'm unaware if this is even theoretically possible.
random
(by default) uses a Pseudo Random Number Generator (PRNG) called Mersenne Twister (MT) which, although suitable for applications such as simulations (and minor things like picking from a list of paths), shouldn't be used in areas where security is a concern due to the fact that it is deterministic.
This is why Python 3.6
also introduces secrets.py
with PEP 506, which uses SystemRandom
(urandom
) by default and is capable of producing cryptographically secure pseudo random numbers.
Of course, bottom line is, that even if you use a PRNG or CPRNG to generate your numbers they're still going to be pseudo random.