|
I don't get it. A linker's task should be straightforward. In essence, it looks up addresses from strings, whose number is bounded by the lines of code written by humans. I think that there must be a lot of incidental complexity if that task somehow becomes a bottleneck. And how can it be that a binary called "cmd/compile" has 170k symbols (that's like, global definitions, right?). Not that that's a huge number in terms of today's computing power, but how many millions of lines of source code does that correspond to? Still, 1M relocations, or 34MB of "Reloc" objects, as indicated, shouldn't be a huge issue to process. Object files should have minimal parsing overhead. Is there any indication how long this takes to link? Shouldn't 1 or 2 secs be sufficient? (assuming 100s of MB/s for sequential disk read/write, and < 1us to store each of the 170k symbols in a hashmap, and < 1us to to look up each of the 1M of relocations). - I don't think mmap should be used if it can be avoided. It means giving up control over what parts of the file are loaded in memory. And from a semantic point of view, that memory region still must be treated differently, since on-disk and in-memory data structures are not the same. |
No, not just global definitions. Closures need linking, too. But the linker is doing much more than linking function entry points. Many automatic (on the stack) variables need linking so the GC can (a) trace the object graph and (b) move them when resizing the stack. Likewise, type definitions require metadata generation for GC tracing. And then there's all the debugging data that needs to be generated, which basically involves everything.