Hacker News new | ask | show | jobs
by Luker88 2209 days ago
Except that gcc and clang are nondeterministic, to an extent

Sometimes the compiler needs to generate a random value and base part of the compilation on that value. Thins like trying to predict which is the best branch, or things done at compile time.

A lot of work has been done to reduce the nondeterminism, and some of it can only be reduced by using things like "-frandom-seed=$your_git_commit" for example

build determinism also goes deeper than that, for example static libraries are archives that include the date of the archive creation, and so on

The simplest programs might generate the same hash, but don't expect all code to generate the same-hash binary by default

2 comments

> Sometimes the compiler needs to generate a random value and base part of the compilation on that value.

As a compiler engineer with experience (among others) in LLVM and GCC this is the first time I'm hearing of this. Could you provide more details or a source?

I can't imagine where such behavior would be useful, let alone required. The only slightly plausible scenario I can think of would be representing some internal data structures as hash tables with random seeds to avoid denial-of-service attacks. But then the compiled code would still have to rely on, at some point, picking an arbitrary element out of such a hash table. I can't think of contexts inside a compiler where this would be a useful thing to do.

No experience here, but from the manpage:

-frandom-seed=string

  This option provides a seed that GCC uses in place of random numbers in generating certain symbol names that have to be different in every compiled file.  It is also used to place unique stamps in coverage data files and the object files that produce them.  You can use the -frandom-seed option to produce reproducibly identical object files.
though my example/guess on branch prediction is probably wrong
Ah, thanks. Looks like this only applies to symbol names, so the generated executable code should be deterministic, only symbol table entries might differ.
> The simplest programs might generate the same hash, but don't expect all code to generate the same-hash binary by default

Debian and others have put quite a lot of work into reproducible software builds:

https://wiki.debian.org/ReproducibleBuilds#Even_more

This of course only works if the compiler cooperates.

The bug linked earlier is a regression in Clang 10. Clang 9 was deterministic for the same file, flags, etc.