| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rocqua 2670 days ago

> If you build the same code on two different machines, using the same compiler, with the same options, then the generated binaries should be exactly the same.

There is so much context that is normally embedded into a binary that this is usually not true unless explicit measures have been taken.

Two very common sources that introduce variability are time-stamps used in the build, and environment variables such as $HOME and $USER.

2 comments

bregma 2670 days ago

If you're generating or modifying source code at build time (eg. adding timestamps or build IDs) then you have violated the constraints on build reproducibility.

detaro 2670 days ago

If you define the problem as excluding things a large percentage of real-world build systems do by default, then it's not very interesting. The interesting part of Debian's and others work here is making this work with small, unintrusive changes to such systems.

rbanffy 2669 days ago

As long as the intrusive changes are taken upstream, I've no problem with it.

anth_anm 2670 days ago

what if you have a multi-threaded backend to the compiler that happens to lay down data in different orders?

jabl 2669 days ago

You don't even need multi-threading. In gcc we had at least one case where a key=>value data structure was keyed by memory address, causing symbols to be emitted in different order depending on ASLR, phase of the moon, or whatever.

tantalor 2670 days ago

That's a bug.

https://en.wikipedia.org/wiki/Race_condition

lixtra 2669 days ago

Why?

Most compilers give no guarantees in which order they lay out the data. I love deterministic processes as much as everyone. But randomized approaches have their advantages too. And if a compiler has reasons to randomize output e.g. for speed than it’s a trade off to consider.

anth_anm 2669 days ago

Thread finishes work

grabs lock

writes to file

writes to index

releases lock

That's not a race condition. The output order doesn't matter, but it is nondeterministic.

tedunangst 2669 days ago

Why is it a bug? I write a program to download four files. I do so in parallel. Sometimes X finishes first, sometimes Y finishes first, and the files are written to disk in a different order. Why do I want to serialize this operation?

rbanffy 2669 days ago

The end result is a set of four files. You don't care about the order they are laid out on the disk and the next steps shouldn't let the order of those files influence the end result.

Let's assume there is a latent bug in the compiler that gets triggered if file four is the first one. Good luck debugging that.

tedunangst 2669 days ago

But parent claimed that creating a set with a different order was inherently a bug. Not that depending on the order of an unordered set was a bug.

anth_anm 2669 days ago

It's the internal structure of the files.

aidenn0 2670 days ago

Don't forget the absolute paths of the source files...

anth_anm 2670 days ago

It always bugged me thats considered part of reproducibility.

That's 100% controllable and deterministic.

aidenn0 2669 days ago

Until very recently you needed root access to do it on linux (user namespaces can let you do it without root).