Question is would we really get any benefits by this - compilation would take longer by some x amount which may or may not be less than the time linking step takes currently.
That's a question worth considering. But fundamentally, it should be faster to write compiler output directly to the final executable than to write it to an object file that is then copied into the final executable.
Is the slow part of linking the "copy all the bytes into the executable" step (in which case avoiding separate-link is a clear win, saving a copy), or is it the "do all the relocations" work, which I think needs to be done anyway ?
I put my question to a friend of mine who works on linkers, and his take was that for a single threaded linker like ld.bfd the copy-bytes part would probably dominate, but that for a multithreaded linker like lld that part trivially parallelizes and so the slow part tends to be elsewhere. He also pointed me at a recent blogpost by the lld maintainer on this topic: https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster which I should go and read...