Hacker News new | ask | show | jobs
by rspeer 3918 days ago
I haven't tried the BTables format, but I agree with their criticism of HDF5. It seems to be an incredibly over-designed format with under-designed APIs.

(Why would I need a directory tree inside a file that only one process can write to anyway? Why wouldn't I just use the filesystem I already have?)

1 comments

> Why would I need a directory tree inside a file that only one process can write to anyway? > Why wouldn't I just use the filesystem I already have?

If you have multiple "tables" that belong together and you need one table to interpret the data in the other table, wouldn't you want them to be grouped together? If they are separate files on the filesystem there is always the risk of forgetting something when you share the data with somebody.

If you can put all the data of an experiment into one file, I think that is very convenient. After all, you don't have to read the complete HDF5 file if you are interested just in a subset of the data.

This is a poor reinvention of archive formats such as .zip.

If I have multiple data files that need to go together, I would like to put them together with a widely-understood tool that has good APIs in many programming languages and can even be interacted with from the shell.

I wonder if it weren't more practical to just use sqlite file for data like that. I mean, it's not plaintext, but sqlite is available pretty much anywhere and provides convenient interface for data.
HDF5 is much better suited for a lot of scientific data sets. How would you store multidimensional data in sqlite? Not everything is a table or matrix. HDF5 also allows you to pick compression filters that are especially suited for the data you have. If you are looking to replace a CSV file, then sqlite is obviously a pragmatic solution.