Hacker News new | ask | show | jobs
by kortex 2310 days ago
This is good info. I've been wary of hdf5 for some time. Nothing concrete (until this bug) but from my research it just consistently smelled fishy. The main turnoff for me was the possibility of data corruption bricking the entire dataset.

Pity, as it has on paper a lot of great concepts and features. Maybe it'll be mature enough someday, though my money is on something better from the ground up coming along.

Honestly, most of the portability advantage is moot nowadays. Chunk s3-like storage, smb, and ability to copy files from ext to ntfs (at least on nix) means that sharing your data across platforms isn't the struggle it used to be. Windows is rapidly becoming/already is a second class citizen in science-data heavy workflows.

I ended up going with a NAS and just file system primitives for my computer vision image workflow, works great.

https://stackoverflow.com/questions/35837243/hdf5-possible-d...

https://cyrille.rossant.net/moving-away-hdf5/

1 comments

The main turnoff for me was the possibility of data corruption bricking the entire dataset.

A glib high level overview of my last job for 6 years was "write out HDF5 files". In that time, I don't recall seeing a true data corruption problem with HDF5.

Now, I ran into many other problems with HDF5, typically surrounding the newer features that came along in 1.10, and its threading limitations. The older folks at that job would mention historical issues with data corruption (often from reading files as they're being written to), but I never saw it myself.