Hacker News new | ask | show | jobs
by jcranmer 2138 days ago
Byte-for-byte identical builds is useful mainly because format-agnostic diffing tools are a lot easier to use. There are quite a lot of cases where something is not byte-for-byte identical merely because of a timestamp that affects nothing.

That said, I think it was (and perhaps still is, to many people) surprising just how many sources of irreproducibility exist. Timestamps and absolute build locations are obvious sources, and to some degree, generally don't have an effect. Iterating over file inode order (i.e., "for each file in directory {}") is usually innocuous, but it can cause link order issues and change ordering of static constructors--which can have drastic effects (both in terms of performance and actual functional changes) on the resulting binary.

But if your build process is going to unexpectedly change the encoding of text files [1], that's actually pretty terrifying. There are also cases where the compiler just seems to randomly choose how to optimize code [2]. Note that randomness isn't coming from an obvious "if rand() % 4" check here, but perhaps from something more subtle such as "we're iterating over a map whose keys are addresses of internal data structures, and we stop optimizing after hitting 1000 entries as the function is too big, and the addresses change because link order or ASLR."

[1] https://tests.reproducible-builds.org/debian/issues/unstable...

[2] https://tests.reproducible-builds.org/debian/issues/unstable...