Hacker News new | ask | show | jobs
by acqq 999 days ago
Note that even when not considering the speedup due to the compression happening in multiple threads, the libraries used for compression here use much less CPU (user 3m33s) than "the standard zip utility" (user 13m13s i.e. 3.7 times the former -- if I understand correctly, this "standard" is Info-ZIP) which is a little less surprising knowing that the source for the later hasn't been updated for 15 years, while, if I understand correctly, this new go version depends on the compression routines maintained in https://pkg.go.dev/compress/flate

I also don't see the comparison of the resulting compression sizes of the two programs.

2 comments

Yep the standard I refer to is Info-ZIP (zip(1)).

I will add the resulting compression sizes- there is not much between them (pzip was around 2% larger for the 10GB directory). Although, I do have some optimizations in mind which will bring this down further.

Allowing for 2% bigger resulting file could mean huge speedup in these circumstances even with the same compression routines, seeing these benchmarks of zlib and zlib-ng for different compression levels:

https://github.com/zlib-ng/zlib-ng/discussions/871

IMO the fair comparison of the real speed improvement brought by a new program is only between the almost identical resulting compressed sizes.

This also depends on whether you distro is using zlib or zlib-ng, which is significantly faster.
Do you know which distributions are using zlib-ng for zip and unzip programs?

If I understand, the improvement would be around 3 times less CPU use for the comparable resulting size, but I see it here shown for "minizip" not zip:

https://github.com/zlib-ng/zlib-ng/discussions/871