| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chisophugis 3409 days ago

Every incremental linking technique I'm aware of involves overwriting the output file and does not guarantee that identical input files and command line lead to identical (bit-exact) output files.

Incremental linking is not so easy under that constraint, since the output depends on the previous output file (which may not even be there).

(and considering the previous output file to be an "input file" follows the letter of the requirement but not the spirit; the idea is that the program invocation is a "pure function" of the inputs, which enables caching and eliminates a source of unpredictable behavior)

We have had to reject certain parallelization strategies in LLD as well because even though the result would always be a semantically identical executable, it would not be bit-identical. See e.g. the discussions surrounding parallel string merging: https://reviews.llvm.org/D27146 <-- fastest technique, but non-deterministic output https://reviews.llvm.org/D27152 <-- slower but deterministic technique https://reviews.llvm.org/D27155 <-- really cool technique that relies on a linearly probed hash table (and sorting just runs of full buckets instead of the entire array) to guarantee deterministic output despite concurrent hash table insertion.

1 comments

comex 3409 days ago

As I said in a different reply, I think nondeterminism is an acceptable sacrifice for development builds, which is where incremental linking would be most useful. That said, it's definitely possible to get some speedup from incrementality while keeping the output deterministic; you'd have to move symbols around, which of course requires relocating everything that points to them, but (with the help of a cache file that stores where the relocations ended up in the output binary) this could probably be performed significantly more quickly than re-reading all the .o files and doing name lookups. But admittedly this would significantly reduce the benefit.

link

chisophugis 3409 days ago

I agree. It's definitely possible. It's just that the actual benefit is far from reducing link time to "O(changes in the input)" and it would introduce significant complexity into the linker (and keeping LLD simple and easy to follow is a high priority). It's definitely an open research area.

> That said, it's definitely possible to get some speedup from incrementality while keeping the output deterministic; you'd have to move symbols around, which of course requires relocating everything that points to them, but (with the help of a cache file that stores where the relocations ended up in the output binary) this could probably be performed significantly more quickly than re-reading all the .o files and doing name lookups. But admittedly this would significantly reduce the benefit.

Yeah. It's not clear if that would be better in practice than a conservative padding scheme + a patching-based approach.

"move symbols around, which of course requires relocating everything that points to them" sounds a lot like what the linker already spends most of its time doing (in its fastest mode).

In its fastest mode, LLD actually spends most of its time memcpy'ing into the output file and applying relocations. This happens after symbol resolution and does not touch the input .o files except to read the data being copied into the output file. The information needed for applying the relocations is read with a bare minimum of pointer chasing (only 2 serially dependent cache misses last I looked) and does not do any hash table lookup into the symbol table nor does it look at any symbol name string.

link

comex 3408 days ago

> It's just that the actual benefit is far from reducing link time to "O(changes in the input)"

Not sure exactly what you mean by this. If you give up determinism, it can be O(changes) - except for time spent statting the input files which, at least in theory, should be possible to avoid by getting the info from the build system somehow. I can understand if LLD doesn't want to trade off determinism, but I personally think it should :)

One practical problem I can think of is ensuring that the binary isn't still running when the linker tries to overwrite bits of it. Windows denies file writes in that case anyway… On Unix that's traditionally the job of ETXTBSY, which I think Linux supports, but xnu doesn't. I guess it should be possible to fake it with APFS snapshots.

> In its fastest mode, LLD actually spends most of its time memcpy'ing into the output file and applying relocations. This happens after symbol resolution and does not touch the input .o files except to read the data being copied into the output file.

Interesting. What is this mode? How does it work if it's not incremental and it doesn't read the symbols at all?

link

chisophugis 3408 days ago

> Not sure exactly what you mean by this. If you give up determinism, it can be O(changes) - except for time spent statting the input files which, at least in theory, should be possible to avoid by getting the info from the build system somehow. I can understand if LLD doesn't want to trade off determinism, but I personally think it should :)

Not quite. For example, a change in the symbols in a single object file can cause different archive members to be fetched for archives later on the command line. A link can be constructed where that would be O(all inputs) changes due to a change in a single file.

Even though a practical link won't hit that pathological case, you still have to do the appropriate checking to ensure that it doesn't happen, which is an annoying transitive-closure/reachability type problem. ( If you need a refresher on archive semantics see the description here: http://llvm.org/devmtg/2016-03/Presentations/EuroLLVM%202016... Even with the ELF LLD using the windows link.exe archive semantics (which are in practice compatible with traditional unix archive semantics), the problem still remains. )

In practice, with the current archive semantics, any change to symbol resolution would likely be best served by bailing out from an incremental link in order to ensure correct output.

Note: some common things that one does during development actually do change the symbol table. E.g. printf debugging is going to add calls to printf where there were none. (and I think "better printf debugging" is one of the main use cases for faster link times). Or if you use C++ streams, then while printf-debugging you may have had `output_stream << "foo: " << foo << "\n"` where `foo` is a string, but then if you change to also output `bar` which is an int, you're still changing the symbol table of the object file (due to different overloads).

> Interesting. What is this mode? How does it work if it's not incremental and it doesn't read the symbols at all?

Compared to the default, mostly it just skips string merging, which is what the linker spends most of its time on otherwise for typical debug links (debug info contains tons of identical strings; e.g. file names of common headers). [1]

To clarify, there are two separate things:

- the fastest mode, which is mostly about skipping string merging. It's just like the default linking mode, it just skips some optional stuff that is expensive.

- the part of the linker profile that the linker spends most of its time doing in its fastest mode (memcpy + relocate); for example, I've measured this as 60% of the profile. This happens after symbol resolution and some preprocessing of the relocations.

Sorry for any confusion.

[1] The linker has "-O<n>" flags (totally different from the "-O<n>" family of flags passed to the compiler). Basically higher -O numbers (from -O0 to -O3 just like the compile, confusingly) cause the linker to do more "fancy stuff" like string deduplication, string tail merging, and identical code folding. Mostly these things just reduce binary size somewhat at a fairly significant link time cost vs "just spit out a working binary".

link