Hacker News new | ask | show | jobs
by mort96 1428 days ago
Well, there are reasons. If your archive format handles compression, it can be designed in such a way that you can seek and extract only parts of the archive. If the archive format doesn't handle compression, you're dependent on reading through the archive sequentially from start to finish.

That's not to say tar is wrong to not have native compression, it's just one reason why it's not crazy for archive formats to natively support compression.

1 comments

I’m semi-sure that this is possible with .tar.gz files already. I’ve used vim to view a text file within a few different rather large archives without noticing the machine choke up on extracting several gigs before showing content. Certainly nothing was written to disk in those cases.
.tar.gz files can only be read sequentially, but there are optimizations in place on common tools that make this surprisingly fast as long as there's enough memory available to essentially mmap the decompressed form. The problem is bigger with archives in the tens of GB (actually pretty common for tarballs since it's popular as a lowest-common-denominator backup format) or resource-constrained systems where the swapping becomes untenable.
There are extensions for gzip that can make it coarsely seekable, I wouldn't be surprised if some archive tools used that.