Limit on number of HDF5 Datasets

2024/10/10 20:21:26

Using h5py to create a hdf5-file with many datasets, I encounter a massive Speed drop after ca. 2,88 mio datasets. What is the reason for this?

I assume that the limit of the tree structure for the datasets is reached and so the tree has to be reordered, which is very time consuming.

Here is a short example:

import h5py
import timehdf5_file = h5py.File("C://TEMP//test.hdf5")barrier = 1
start = time.clock()
for i in range(int(1e8)):hdf5_file.create_dataset(str(i), [])td = time.clock() - startif td > barrier:print("{}: {}".format(int(td), i))barrier = int(td) + 1if td > 600: # cancel after 600sbreak

Time measurement for key creation

edit:

By grouping the datasets this limitation can be avoided:

import h5py
import timemax_n_keys = int(1e7)
max_n_group = int(1e5)hdf5_file = h5py.File("C://TEMP//test.hdf5", "w")
group_key= str(max_n_group)
hdf5_file.create_group(group_key)barrier=1
start = time.clock()
for i in range(max_n_keys):if i>max_n_group:max_n_group += int(1e5)group_key= str(max_n_group)hdf5_file.create_group(group_key)hdf5_file[group_key].create_dataset(str(i), data=[])td = time.clock() - startif td > barrier:print("{}: {}".format(int(td), i))barrier = int(td) + 1

Time measurement for key creation with grouping

Answer

Following documentation of hdf5 group found at MetaData caching, I was able to push limit where performances are drastically dropping. Basically, I called (in C/C++, don't know how to access similar HDF5 function from python) H5Fset_mdc_config(), and changed max_size value in the config parameter, to 128*1024*124

Doing so, I was able to created 4 times more datasets.

Hope it helps.

https://en.xdnf.cn/q/69857.html

Related Q&A

Object level cascading permission in Django

Projects such as Django-guardian and django-permissions enables you to have object level permissions. However, if two objects are related to each other by a parent-child relationship, is there any way …

How do I find out eigenvectors corresponding to a particular eigenvalue of a matrix?

How do I find out eigenvectors corresponding to a particular eigenvalue? I have a stochastic matrix(P), one of the eigenvalues of which is 1. I need to find the eigenvector corresponding to the eigen…

How to install my custom Python package with its custom dependencies?

I would like to find a way to install my own python package which depends on other custom python packages. I followed this guide to create my own python packages: https://python-packaging.readthedocs.i…

How to call a function only Once in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 8 years ago.The com…

Generating Compound Pie, or Pie of Pie Charts

Below is an example of a compound pie chart, also known as a pie of pie chart drawn using Excel. Is it possible to create a figure like this using python?

GitPython : git push - set upstream

Im using GitPython to clone a master branch and do a checkout of a feature branch, I do my local updates, commit and push back to git. The code snippet looks like below, Note : my branch name is featur…

How to access multi-level index in pandas data frame?

I would like to call those row with same index.so this is the example data frame, arrays = [np.array([bar, bar, baz, baz, foo, foo, qux, qux]), np.array([one, two, one, two, one, two, one, two])]df = p…

How to represent graphs with IPython

Recently I discovered IPython notebook which is a powerful tool. As an IT student, I was looking for a way to represent graphs in Python. For example, I would like to know if theres a library (like num…

How to check TypeVars Type at runtime

I have a generic class Graph[Generic[T], object]. My question, is there any function which returns type passed as generic to the class Graph>>> g = Graph[int]() >>> magic_func(g) <…

Can django-pagination do multiple paginations per page?

If it cant then are there any other alternatives (either Djangos native pagination or an alternate package) that allows multiple paginations per page?I would like to display a list of about 5 objects …