I am trying to speed up the process of reading chunks (load them into RAM memory) out of a h5py dataset file. Right now I try to do this via the multiprocessing library.
pool = mp.Pool(NUM_PROCESSES)
gen = pool.imap(loader, indices)
Where the loader function is something like this:
def loader(indices):with h5py.File("location", 'r') as dataset:x = dataset["name"][indices]
This actually sometimes works (meaning that the expected loading time is divided by the number of processes and thus parallelized). However, most of the time it doesn't and the loading time just stays as high as it was when loading the data sequentially. Is there anything I can do to fix this? I know h5py supports parallel read/writes through mpi4py but I would just want to know if that is absolutely necessary for only reads as well.