Hacker News new | ask | show | jobs
by strogonoff 388 days ago
Different trade-offs is why it might make sense to embrace the Unix way for file formats: do one thing well, and document it so that others can do a different thing well with the same data and no loss.

For example, if it is an archival/recording oriented use case, then you make it cheap/easy to add data and possibly add some resiliency for when recording process crashes. If you want efficient random access, streaming, storage efficiency, the same dataset can be stored in a different layout without loss of quality—and conversion between them doesn’t have to be extremely optimal, it just should be possible to implement from spec.

Like, say, you record raw video. You want “all of the quality” and you know all in all it’s going to take terabytes, so bringing excess capacity is basically a given when shooting. Therefore, if some camera maker, in its infinite wisdom, creates a proprietary undocumented format to sliiightly improve on file size but “accidentally” makes it unusable in most software without first converting it using their own proprietary tool, you may justifiedly not appreciate it. (Canon Cinema Raw Light HQ—I kid you not, that’s what it’s called—I’m looking at you.)

On this note, what are the best/accepted approaches out there when it comes to documenting/speccing out file formats? Ideally something generalized enough that it can also handle cases where the “file” is in fact a particularly structured directory (a la macOS app bundle).

1 comments

Adding to the recording _raw_ video point, for such purposes, try to design the format so that losing a portion of the file doesn't render it entirely unusable. Kinda like how you can recover DV video from spliced tapes because the data for the current frame (+/- the bordering frame) is enough to start a valid new file stream.