Hacker News new | ask | show | jobs
by lifthrasiir 1768 days ago
> Show me any zstd level that has significantly different speed and data size than one of the levels of those three can't match.

In case you are not just trolling, I had some very large JSON file with a large amount of text in my SSD around and the compression time was as follows [1]:

    8,826,654,133       original
    4,763,212,322  0:29 lz4 -1
    3,815,508,500  0:52 brotli -1
    3,715,002,172  2:09 lz4 -3
    3,668,204,232  2:27 gzip -1
    3,159,659,113  1:59 brotli -2
    3,118,316,529 10:55 lzma -0 -T4
    3,025,746,073  1:21 zstd -3 -T1 (default)
Zstandard is not just lzma+gzip+lz4. It is better than everything you've mentioned at least for my test case. In fact I first compressed with zstd and tried to match the compression time for others, because zstd is very fast and using anything else as a reference point would have taken me much more time. It does have enough reason to claim itself to be "standard".

[1] Tested with i7-7700 3.60GHz and 48 GiB of RAM (but no algorithm tested use more than several MBs of memory anyway). I'm using a pretty fast SSD here so I/O speed is not much of concern. Also note that every algorithm except for lzma is single-threaded.

1 comments

Sounds like you're comparing multi-threaded zstd to other utilities that are single-threaded (e.g. gzip instead of pigz).

I duped /usr/share/dict/words into an 8GB file and did a couple tests on the old system I'm on:

  8589934592  original
  3075315128  pigz -1       1m0.44s
  2926825272  zstdmt -3     2m37.52s
  2877549999  pigz -3       1m28.94s
Zstd by default is single-threaded. Just in case though...

    8,826,654,133       original
    3,033,695,892  0:22 zstd 1.3.3 -3 -T4
    3,025,746,073  1:21 zstd 1.3.3 -3 -T1 (default)
    3,017,972,162  1:05 zstd 1.5.0 -3 -T1 (default)
    3,013,860,663  1:23 zstd 1.5.0 -3 --single-thread
It is just no match.

EDIT: axiolite wanted to see --single-thread and while my Linux box only has an older zstd (1.3.3) without that option I realized I do have a Windows executable for recent zstd (1.5.0). Both executables have run in the same machine but I can't guarantee anything about the 1.5.0 binary.

zstd is NOT single-threaded by default. It's dual threaded. You have to pass the --single-thread option to make it single threaded.

I think you need to try pigz...

--single-thread probably doesn't do what you think, it forces zstd to serialize I/O with the compression job and it doesn't mean compression itself is multi-threaded with -T1 [EDIT]. I can't try that anyway because my zstd version is slightly lower (1.3.3), but I can try pigz:

    3,664,854,169  0:42 pigz -1 -p4
    3,664,854,169  0:28 pigz -1 -p8 (default, also probably the max possible with my box)
Now I'm very curious where you got your copy of zstdmt. (I'm using stock Ubuntu packages for all of them.)

[EDIT] Was "it doesn't make compression itself multi-threaded", which can falsely imply that --single-thread seemingly enables multi-threaded compression but it doesn't ;)

Using stock RHEL7/EPEL packages: zstd-1.5.0-1.el7.x86_64 On an old Athlon II X4 615e right now.

The difference in our results certainly is curious.

Very interesting. It might be the case that zstd optimizes more for recent machines; zstd famously uses four different compression streams to maximize instruction-level parallelism and that might not work well in older machines. I haven't seen any machine where zstd is significantly slower than it should, but those machines I could test came from 2013 or later. Or either the RHEL package might have been optimized for recent machines. It would be interesting to test a binary optimized for the current machine (-march=native -mtune=native).
pigz is no match for zstd...