Hacker News new | ask | show | jobs
by brtv 1106 days ago
Interesting! I'm currently working on a system that writes time series data to raw binary files, but we're considering switching to a different file format for the same reasons. Have you considered any other formats, such as hdf5?
1 comments

The one potential requirement I'll caution you on is resilience against write failures, in case you're collecting time-series data and can't afford to lose a "session" or spend time messing with recovery options. HDF5 is not made for that. Binary and SQLite are better in that respect. SQLite wins on usability against binary and HDF5.

https://cyrille.rossant.net/moving-away-hdf5/

I'm not familiar with HDF5 but agree on resiliency. We are a collegiate racing team and our car's power rails aren't stable and redundant at all times, so power loss failure is something I've kept in mind from day one on my telemetry project. SQLite is generally equipped to handle power losses and write thread crashes:

> An SQLite database is highly resistant to corruption. If an application crash, or an operating-system crash, or even a power failure occurs in the middle of a transaction...

Quite often after a run the entire car is turned off and, on next power up, the databases are left as .db and .db-journal files. The code has no problem processing or even continuing on logging with DBs in this state.

HDF5 is more for storing and exchanging numerical simulation data. It doesn't have to be resilient to write failures because worst-case scenario you rerun the simulation or try again to copy the data into the file.

SQLite writes are "atomic" transactions. After writing new data, it goes back to the index and registers that new data has been written using a single instruction. That's why interrupting it in the middle of a write doesn't result in partial data or a corrupted index.