Is it possible to do parallel reads on one h5py file using multiprocessing?

2024/9/30 19:30:31

I am trying to speed up the process of reading chunks (load them into RAM memory) out of a h5py dataset file. Right now I try to do this via the multiprocessing library.

pool = mp.Pool(NUM_PROCESSES)
gen = pool.imap(loader, indices)

Where the loader function is something like this:

def loader(indices):with h5py.File("location", 'r') as dataset:x = dataset["name"][indices]

This actually sometimes works (meaning that the expected loading time is divided by the number of processes and thus parallelized). However, most of the time it doesn't and the loading time just stays as high as it was when loading the data sequentially. Is there anything I can do to fix this? I know h5py supports parallel read/writes through mpi4py but I would just want to know if that is absolutely necessary for only reads as well.

Answer

Parallel reads are fine with h5py, no need for the MPI version. But why do you expect a speed-up here? Your job is almost entirely I/O bound, not CPU bound. Parallel processes are not gonna help because the bottleneck is your hard disk, not the CPU. It wouldn't surprise me if parallelization in this case even slowed down the whole reading operation. Other opinions?

https://en.xdnf.cn/q/71052.html

Related Q&A

Where is a django validator functions return value stored?

In my django app, this is my validator.py from django.core.exceptions import ValidationError from django.core.validators import URLValidatordef validate_url(value):url_validator = URLValidator()url_inv…

Modifying YAML using ruamel.yaml adds extra new lines

I need to add an extra value to an existing key in a YAML file. Following is the code Im using.with open(yaml_in_path, r) as f:doc, ind, bsi = load_yaml_guess_indent(f, preserve_quotes=True) doc[phase1…

How to get the background color of a button or label (QPushButton, QLabel) in PyQt

I am quite new to PyQt. Does anyone tell me how to get the background color of a button or label (QPushButton, QLabel) in PyQt.

Is it possible to make sql join on several fields using peewee python ORM?

Assuming we have these three models.class Item(BaseModel):title = CharField()class User(BaseModel):name = CharField()class UserAnswer(BaseModel):user = ForeignKeyField(User, user_answers)item = Foreign…

Django multiple form factory

What is the best way to deal with multiple forms? I want to combine several forms into one. For example, I want to combine ImangeFormSet and EntryForm into one form:class ImageForm(forms.Form):image =…

How to include the private key in paramiko after fetching from string?

I am working with paramiko, I have generated my private key and tried it which was fine. Now I am working with Django based application where I have already copied the private key in database.I saved m…

SHA 512 crypt output written with Python code is different from mkpasswd

Running mkpasswd -m sha-512 -S salt1234 password results in the following:$6$salt1234$Zr07alHmuONZlfKILiGKKULQZaBG6Qmf5smHCNH35KnciTapZ7dItwaCv5SKZ1xH9ydG59SCgkdtsTqVWGhk81I have this snippet of Python…

Running python scripts in Anaconda environment through Windows cmd

I have the following goal: I have a python script, which should be running in my custom Anaconda environment. And this process needs to be automatizated. The first thing Ive tried was to create an .exe…

How to work out ComplexWarning: Casting complex values to real discards the imaginary part?

I would like to use a matrix with complex entries to construct a new matrix, but it gives me the warning "ComplexWarning: Casting complex values to real discards the imaginary part".As a resu…

Is it possible to use POD(plain old documentation) with Python?

I was wondering if it is possible to use POD(plain old documentation) with Python? And how should I do it?