| HN Mirror

Stavros from TileDB, Inc. here: HDF5 is a great software and TileDB was heavily inspired by it. HDF5 probably works great for your use case. TileDB matches the HDF5 performance in the dense case, but in addition it addresses some important limitations of HDF5, which may or may not be relevant to your use case. These include: sparse array support (not relevant to you), multiple readers multiple writers through thread- and process-safety (HDF5 does not have full thread-safety, whereas also it does not support parallel writes with compression - I am assuming you are using MPI and a single writer though, so still HDF5 should work well for you), efficient writes in a log-structured manner that enables multi-versioning and fault tolerance (HDF5 may suffer from file corruption upon error and file fragmentation - you are probably not updating, so still not very relevant to you). Having said that and echoing Jake's comment, we would love to hear from you how TileDB could be adapted to serve your case better.

A general comment: TileDB’s vision goes beyond that of the HDF5 (or any scientific) format. Considering though the quantities of HDF5 data out there (and the fact that we like the software), we are thinking about building some integration with HDF5 (and NetCDF). For instance, you may be able to create a TileDB array by “pointing” to an HDF5 dataset, without unnecessarily ingesting the HDF5 files but still enjoying the TileDB API and extra features.