Hacker News new | ask | show | jobs
by esotericn 2360 days ago
Multithreaded xz is non-deterministic and so it's not a candidate.
2 comments

How is it non deterministic? Works pretty consistently for me with pixz.
The bytes of the compressed file are non deterministic and depend on the number of cores used, system load and other “random” factors.
Can't you set those parameters during compression to something fixed? Should be doable.
Or you can just use zstd.

The xz tool is not deterministic when compressing. The packaging team might change upstream for a few things, but diving into the innards of a compression tool is expecting a bit much.

We are talking about decompression speed and not encryption. Decompression is necessarily deterministic.
The compression speed is also an issue for developers. In many cases the compression step takes longer than the rest of the build.
May be the point is that compressed package can change every time, which is an issue for reproducible builds idea many distros now are using. Though I'm not sure why parallelized xz can't behave in predictable fashion.
No, I mean you don’t need to parallel compress. The compression speeds don’t matter, and are compatible with single- or multi-threaded decompression.
Compression speed can matter in general (to improve build times).

For xz, you need to compress with chunking (and may be indexing for more benefit), in order to allow parallel decompression to begin with. Otherwise xz produces a blob which you can't split into independent parts during decompression, which makes using many decompression threads pointless.

But yes, if parallel compression is creating non determinism, you can do all the compression work with chunking without parallelism, still allowing parallel decompression. But I'm not sure why it even has to create non determinism in the first place.