Hacker News new | ask | show | jobs
by johannes1234321 51 days ago
When looking at established file formats, I'd start with zip for that usecase over tarballs. zip has compression and ability to access any file. A tarfule you have to uncompress first.

SquashFS or cramps or such have less tooling, which makes the usage for generating, inspecting, ... more complex.

3 comments

You only have to decompress it first if it's compressed (commonly using gzip, which is shown with the .gz suffix).

Otherwise, you can randomly access any file in a .tar as long as: - the file is seekable/range-addressible - you scan through it and build the file index first, either at runtime or in advance.

Uncompressed .tar is a reasonable choice for this application because the tools to read/write tar files are very standard, the file format is simple and well-documented, and it incurs no computational overhead.

You've just constructed your own crappy in-memory zip file, here. If you have to build your own custom index, you're no longer using the standard tools. If you find yourself building indices of tar files, and you control the creation, give yourself a break and use a zip file instead. It has the index built in. Compression is not required when packing files into a zip, if you don't want it.
Yeah it's pretty common to use zip files as purely a container format, with no compression enabled. You can even construct them in such a way it's possible to memory map the contents directly out of the zip file, or read them over network via a small number of range requsts.
> Uncompressed .tar is a reasonable choice for this application

Yes, uncompressed tar (with transfer compression, which is offered in HTTP) is an option for some amount of data.

Till the point where it isn't. zip has similar benefits as tar(+transfer compression) but a later point where it fails for such a scenario.

Zip allows you to set compression algorithm on a per-file basis, including no compression.
You can achieve the same with tar if you individually compress the files before adding them to the tar ball instead of compressing the tar ball itself.

I don’t see how that plus a small index of offsets would be notably more or less work to do from using a zip file.

Zip has a central directory you could just query, instead of having to construct one in-memory by scanning the entire archive. That's significantly less work.
I mean if they include a pre-made index with it. For example an uncompressed index at byte offset 0 in the tar ball that lists what is inside and their offsets. It would still be comparable amount of work to create software to do that with tar as to use a zip file, if fine grained compression levels etc is being used.
Romfs is more capable, simple to support, and doesn't have the overhead of tar's large headers and typical large blocking factors.
Zip is a piece of cake.

I had need to embed noVNC into an app recently in Golang. Serving files via net/http from the zip file is practically a one-liner (then just a Gorilla websocket to take the place of websockify).

I second the idea to use a zip. In fact it's what a lot of vendors do because it is so ubiquitous, even Microsoft for example - the "open Microsoft Office XML document format" is just a zip file containing a bunch of folders and XML files.