Hacker News new | ask | show | jobs
by Lammy 246 days ago
> there's no straightforward way to support "solid" compression.

I do it by ignoring ZIP's native compression entirely, using store-only ZIP files and then compressing the whole thing at the filesystem level instead.

Here's an example comparison of the same WWW site rip in a DEFLATE ZIP, in a store-only ZIP with zstd filesystem compression, in a tar with same zstd filesystem compression (identical size but less useful for seeking due to lack of trailing directory versus ZIP), and finally the raw size pre-zipping:

  982M preserve.mactech.com.deflate.zip
  408M preserve.mactech.com.store.zip
  410M preserve.mactech.com.tar
  3.8G preserve.mactech.com


  [Lammy@popola] zfs get compression spinthedisc/Backups/WWW
  NAME                     PROPERTY     VALUE           SOURCE
  spinthedisc/Backups/WWW  compression  zstd            local

This probably wouldn't help GP with their need for HTTP seeking since their HTTP server would incur a decompress+recompress at the filesystem boundary.
1 comments

lool why use zip then anyways? put them in a folder
It's for when you have a very large number of mostly-identical files, like web pages with consistent header and footer. If 408MiB versus 3.8GiB is a meaningless difference to you then sure don't bother with compression, but why I want it should be very obvious to most people here.
you completely missed what i asked you but ok
I don't think I did, but please explain :)

The last example in my list of four file sizes is them in a folder. Filesystem compression works at the file level, so you have to turn many-almost-identical-files into one file in order to benefit from it. ZFS does have block-level deduplication, but that's it's own can of worms that shouldn't be turned on flippantly due to resource requirements and `recordsize` tuning needed to really benefit from it.

you do not need dedup just use reflinks for everything. if that workflow does not work then eh i understand why you would use zips

although zfs dedup is probably better in 2025