Hacker News new | ask | show | jobs
by vasi 3588 days ago
xz in multithreaded mode supports random access too, at least theoretically. But there's no reasonable way with xz to actually find the file in a tarball you want to access, it's that bit that pixz provides.

Another nice thing about pixz is it does parallel decompression, as well as compression.

(Disclaimer: I'm the original author of pixz.)

1 comments

I was thinking about that "no reasonable way" comment. When you uncompress the first block, you will find the first tar header. From that you can know the uncompressed offset of the next tar header. If the compressed stream does support random access, you should be able to uncompress a block (assuming uncompressed block size was a multitple of 512 bytes) to get to the next tar header. You can repeat this until you get to the file you are looking for.

With large files, this approach would be of huge value. If the files tend to be no larger than block_size - 512, there will be no speedup.

Of course, this would need to be implemented directly in tar, not by piping the output of a decompression command through tar.