Wow. Thank you for making this. I'm frequently have to zip and unzip ~100GB of zip archive and I have to waste 10 minutes of waiting on a fast NVMe and 32 cores workstation. I know about ZSTD or pigz but the format must be zip.
7-Zip by Igor Pavlov can create zip files, has multi-threading and in my small test, comparing with the "pzip", was both as fast in "real" time and produced smaller file (while using similar amount of CPU but differently distributed between user and sys).
Testing with 100 MB set from mattmahoney.net and relatively comparable sizes pzip is twice as fast as the previously mentioned Pavlov's 7z, that's clearly useful for those who need the fastest possible creation of a "classic" zip with compressed files, when lower compression ratio (1.6 MB bigger compressed file when compressing 100 MB set, compared to 7z) is acceptable.
$ time zip -2 -r a-zip.zip 100mb/ >/dev/null
real user sys: 2,1 1,8 0,1
$ time 7z -tzip -mx=1 a a-7z-1.zip 100mb/ >/dev/null
real user sys: 1,0 2,7 0,0
$ time ../pzip a-pzip.zip 100mb/ >/dev/null
real user sys: 0,5 1,0 0,1
$ L a
48197707 a-7z-1.zip
49921626 a-pzip.zip
49553097 a-zip.zip
If the "classic" (i.e. the goal to unpack the archive using older programs) compatibility is not important, it could be interesting to consider that at least since 2020 zstd is officially a "standard" method for ZIP files too, allowing even faster compression speed for the same compression size targets.
Note that even when not considering the speedup due to the compression happening in multiple threads, the libraries used for compression here use much less CPU (user 3m33s) than "the standard zip utility" (user 13m13s i.e. 3.7 times the former -- if I understand correctly, this "standard" is Info-ZIP) which is a little less surprising knowing that the source for the later hasn't been updated for 15 years, while, if I understand correctly, this new go version depends on the compression routines maintained in https://pkg.go.dev/compress/flate
I also don't see the comparison of the resulting compression sizes of the two programs.
I will add the resulting compression sizes- there is not much between them (pzip was around 2% larger for the 10GB directory). Although, I do have some optimizations in mind which will bring this down further.
Allowing for 2% bigger resulting file could mean huge speedup in these circumstances even with the same compression routines, seeing these benchmarks of zlib and zlib-ng for different compression levels:
Do you know which distributions are using zlib-ng for zip and unzip programs?
If I understand, the improvement would be around 3 times less CPU use for the comparable resulting size, but I see it here shown for "minizip" not zip:
How does this compare against pigz? [1]. Afaik pigz comes bundled in some modern distros, I’ve also personally used it in some backup operations reliably
That's generally true, but theoretically, pigz can extract single-member zip archives. I assume that they are both equally fast, assuming that they use zlib. libdeflate or ISA-L should speed this up significantly.
Furthermore, for compression, this still might be a valid question especially for the single-file case because it sounds like pzip parallelizes over the file members and cannot speed up compression/decompression of a single file member.