Hacker News new | ask | show | jobs
by joelthelion 1508 days ago
In the Era of zstandard, do we really need this?
4 comments

I find it somewhat telling that they don't benchmark themselves against zstd.

Right now I'm almost exclusively using zstd (general stuff) or lzma2/xz (high compression where read speed doesn't matter). And of course gz and zip for data interchange where compatibility is key. From the information presented bzip3 won't replace any of those use cases for me, but that's fine. Maybe it fits somebody else's use case, or maybe it's the foundation for the next great algorithm that we all end up using.

zstd -19 linux.tar 462.58s user 0.76s system 100% cpu 217M memory 7:42.56 total

% wc -c linux.tar.zst linux.bz3 134980904 linux.tar.zst 129255792 linux.bz3

  # compression

  bzip3 -j 4 -e linux-5.18-rc6.tar linux-5.18-rc6.tar.bz3 
    user: 345.48s system: 0.59s cpu: 373% total: 1:32.75

  zstd -19 --long -T4 -f linux-5.18-rc6.tar
    user: 1270.48s system: 0.89s cpu: 376% total: 5:37.9

  > du -b linux-5.18-rc6.tar.* | sort -rn | reln
  1.000000  130907738  linux-5.18-rc6.tar.zst
  0.994715  130215881  linux-5.18-rc6.tar.bz3
With additional ‘--ultra -22’ tar.zst is smaller, but the compression time sky rockets.

  # decompression 

  bzip3 -j 4 -d linux-5.18-rc6.tar.bz3 linux-5.18-rc6.tar 
    user: 222.57s system: 0.92s cpu: 362% total: 1:01.69

  bzip3 -d linux-5.18-rc6.tar.bz3 linux-5.18-rc6.tar 
    user: 141.29s system: 0.89s cpu: 99% total: 2:22.19

  zstd -d -T4 -f linux-5.18-rc6.tar.zst 
    user: 2.26s system: 0.84s cpu: 99% total: 3.102
zstd doesn’t seem to support parallel decoding, but still 20x faster
Is reln a command to add a column of relative numbers to the left? Neat.
Yes, a small python script.
Have you ever tried lzma/lzma2 with the hc3 (hash chain) match finder instead of the default (bt3 or bt4) match finder? I've found this to be a really good middle ground between gz/deflate and lzma2 with default settings.
Yes, because someone said the same when zstandard came out. This may not have the same strong points, but maybe the next will… compression is not a completed task.
Not to mention the restrictive license which effectively prohibits its use in any Open Source project licensed under anything other than GPLv3.
Frankly, same holds for gzip. I've been planning to relicense bzip3 with the more permissive LGPLv3.
gzip has BSD-licensed compatible alternatives already. It's doubtful the same attention would be given to bzip3; chicken-and-egg scenario there. Plus the lingering question of "Why not zstd?"
Gzip is just a frontend for zlib, which is BSD(ish).
1. zStandard is not a standard

2. Bzip2 is somewhat is a standard

3. zStandard is not a substitute for Bzip2

In what way is bzip2 more of a "standard" than zstd? bzip2 doesn't even seem to have any official reference description of its file format; just an "unofficial" one[1], whereas zstd is RFC 8478[2].

When I evaluated various compression algorithms a few years ago zstd came ahead of bzip2 in every metric.

[1]: https://github.com/dsnet/compress/blob/master/doc/bzip2-form...

[2]: https://datatracker.ietf.org/doc/html/rfc8478

That is interesting.

The author of lzip has harsh criticism of xz, and admiration of bzip2 for error detection/correction and "rightsizing" the container format.

I use lzip in preference to xz unless I need portability.

https://www.nongnu.org/lzip/xz_inadequate.html

As far as I know xz and zstd and completely unrelated?