Hacker News new | ask | show | jobs
by emmelaich 932 days ago
Fun fact: in a sense. gzip can have multiple files, but not in a specially useful way ...

    $ echo meow >cat                                                            
    $ echo woof > dog                                                           
    $ gzip cat                                                                  
    $ gzip dog                                                                  
    $ cat cat.gz dog.gz >animals.gz                                             
    $ gunzip animals.gz                                                         
    $ cat animals                                                               
    meow                                                                        
    woof
4 comments

> ... but not in a specially useful way ...

It can be very useful: https://github.com/google/crfs#introducing-stargz

It is specially useful, it is not especially/generally useful lol

It could be a typo, though I think when we say something "isn't specially/specifically/particularly useful" we mean "compared to the set of all features, specifically this subset feature is not that useful" not that the feature isn't useful for specific things

Indeed! I should have written "especially" not "specially"
Imo all file formats should be concatenable when possible. Thankfully ZStandard purposefully also supports this, which is a huge boon for combining files.

Fun fact, tar-files are also (semi-) concatenable, you'll just need to `-i` when decompressing. This also means compressed (using gz/zstd) tarfiles are also (semi-)concatenable!

WARC files (used by the Internet Archive to power the Wayback machine, among others) use this trick too to have a a compressed file format that is seek-able to individual HTTP request/response records
Wow, that's surprising (at least to me)!

Is there a limit in the default gunzip implementation? I'm aware of the concept of ZIP/tar bombs, but I wouldn't have expected gunzip to ever produce more than one output file, at least when invoked without options.

It only produces one output. It's just a stream of data.
Ah, I somehow imagined a second `cat` in there. That makes more sense, thank you!
The limit is it doesn't do filenames or other metadata — it's limited to contents.