| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by johannes1234321 51 days ago
	When looking at established file formats, I'd start with zip for that usecase over tarballs. zip has compression and ability to access any file. A tarfule you have to uncompress first. SquashFS or cramps or such have less tooling, which makes the usage for generating, inspecting, ... more complex.

3 comments

nrclark 51 days ago

You only have to decompress it first if it's compressed (commonly using gzip, which is shown with the .gz suffix).

Otherwise, you can randomly access any file in a .tar as long as: - the file is seekable/range-addressible - you scan through it and build the file index first, either at runtime or in advance.

Uncompressed .tar is a reasonable choice for this application because the tools to read/write tar files are very standard, the file format is simple and well-documented, and it incurs no computational overhead.

link

electroly 51 days ago

You've just constructed your own crappy in-memory zip file, here. If you have to build your own custom index, you're no longer using the standard tools. If you find yourself building indices of tar files, and you control the creation, give yourself a break and use a zip file instead. It has the index built in. Compression is not required when packing files into a zip, if you don't want it.

link

marginalia_nu 51 days ago

Yeah it's pretty common to use zip files as purely a container format, with no compression enabled. You can even construct them in such a way it's possible to memory map the contents directly out of the zip file, or read them over network via a small number of range requsts.

link

johannes1234321 51 days ago

> Uncompressed .tar is a reasonable choice for this application

Yes, uncompressed tar (with transfer compression, which is offered in HTTP) is an option for some amount of data.

Till the point where it isn't. zip has similar benefits as tar(+transfer compression) but a later point where it fails for such a scenario.

link

chungy 51 days ago

Zip allows you to set compression algorithm on a per-file basis, including no compression.

link

QuantumNomad_ 51 days ago

You can achieve the same with tar if you individually compress the files before adding them to the tar ball instead of compressing the tar ball itself.

I don’t see how that plus a small index of offsets would be notably more or less work to do from using a zip file.

link

chungy 51 days ago

Zip has a central directory you could just query, instead of having to construct one in-memory by scanning the entire archive. That's significantly less work.

link

QuantumNomad_ 51 days ago

I mean if they include a pre-made index with it. For example an uncompressed index at byte offset 0 in the tar ball that lists what is inside and their offsets. It would still be comparable amount of work to create software to do that with tar as to use a zip file, if fine grained compression levels etc is being used.

link

kevin_thibedeau 51 days ago

Romfs is more capable, simple to support, and doesn't have the overhead of tar's large headers and typical large blocking factors.

link

blipvert 51 days ago

Zip is a piece of cake.

I had need to embed noVNC into an app recently in Golang. Serving files via net/http from the zip file is practically a one-liner (then just a Gorilla websocket to take the place of websockify).

link

sixdimensional 51 days ago

I second the idea to use a zip. In fact it's what a lot of vendors do because it is so ubiquitous, even Microsoft for example - the "open Microsoft Office XML document format" is just a zip file containing a bunch of folders and XML files.

link