Hacker News new | ask | show | jobs
by danbst 1831 days ago
Just recently, there were large non-reproducible projects: python, gcc. Not sure where is the history of non-r13y.

---

There is Debian initiative to create bit-to-bit reproducible builds for all their software (well, all critical).

https://reproducible-builds.org/

R13y is akin to "computer proofs" in math -- if you don't have it, that's fine, but if you have it, that's awesome.

There are practical reasons to favor reproducibility too, but those are more for distro maintainers.

The fact that NixOS (not Debian) got this 100% is mostly because

- minimal image has a small subset of packages (https://hydra.nixos.org/build/146009592#tabs-build-deps)

- Nix tooling was created 15 years ago *exactly* for this, Nix is mad to make packages bit-to-bit rebuildable from scratch.

- Nix/Nixpkgs is growing in number of maintainers and got more funds

- Nix has fewer Docker/Snap pragmatics

3 comments

>- Nix tooling was created 15 years ago exactly for this, Nix is mad to make packages bit-to-bit rebuildable from scratch.

I don't think this is accurate?

Nix is about reproducing system behaviour, largely by capturing the dependency graph and replaying the build. But this doesn't entail bit-for-bit identical binaries. It's very much sits in the same group such as Docker and similar technologies. This is also how I read the original thesis from Eelco[0].

And well, claims like this always rubs me the wrong way since nixos only really started using the word "reproducible builds" after Debian started their efforts in 2015-2016[1], and started their reproducible builds effort later. It also muddies the language since people are now talking about "reproducible builds" in terms of system behavior as well as bit-for-bit identical builds. The result has been that people talk about "verifiable builds" instead.

[0]: https://edolstra.github.io/pubs/phd-thesis.pdf

[1]: https://github.com/NixOS/nixpkgs/issues/9731

> There's no good reason (that I can think of) why this shouldn't have been the case all along

Determinism can decrease performance dramatically. Like concatenating items (say, object files into a library) in order is clearly more expensive in both time & space than processing them out of order. One requires you to store everything in memory and then sort them before you start doing any work, whereas the other one lets you do your work in a streaming fashion. Enforcing determinism can turn an O(1)-space/O(n)-time algorithm into an O(n)-space/O(n log n)-time one, increasing latency and decreasing throughput. You wouldn't take a performance hit like that without a good reason to justify it.

Being bit-for-bit reproduceable means you could do fun things like distribute packages as just sources and a big blob of signatures, and you can still run only signed binaries.