| ZIP isn't a terrible format, but it has a couple of flaws and limitations which make it a less than ideal format for long-term archiving. The biggest ones I'd call out are: 1) The format has limited and archaic support for file metadata - e.g. file modification times are stored as a MS-DOS timestamp with a 2-second (!) resolution, and there's no standard system for representing other metadata. 2) The single-level central directory can be awkward to work with for archives containing a very large number of members. 3) Support for 64-bit file sizes exists but is a messy hack. 4) Compression operates on each file as a separate stream, reducing its effectiveness for archives containing many small files. The format does support pluggable compression methods, but there's no straightforward way to support "solid" compression. 5) There is technically no way to reliably identify a ZIP file, as the end of central directory record can appear at any location near the end of the file, and the file can contain arbitrary data at its start. Most tools recognize ZIP files by the presence of a local file header at the start ("PK\x01\x02"), but that's not reliable. |
I do it by ignoring ZIP's native compression entirely, using store-only ZIP files and then compressing the whole thing at the filesystem level instead.
Here's an example comparison of the same WWW site rip in a DEFLATE ZIP, in a store-only ZIP with zstd filesystem compression, in a tar with same zstd filesystem compression (identical size but less useful for seeking due to lack of trailing directory versus ZIP), and finally the raw size pre-zipping:
This probably wouldn't help GP with their need for HTTP seeking since their HTTP server would incur a decompress+recompress at the filesystem boundary.