| HN Mirror

That's... what we're talking about. Simple symbols with calling conventions.

The rules for this proposed ABI are exactly the same as the existing amd64-SystemV C ABI, with one difference: the stack-to-stack copies aren't generated at the call-site; instead, the generated code at the call-site passes the address (in a register, or spilled to stack) for what it would have copied. The compiler generates the stack-to-stack copy in the generated function's prologue, using the address it was passed. Nothing more, nothing less. It's just moving the required location for certain generated code across the linkage, and keeping a temporary alive a little bit longer to make that work. (And in exchange, the temporary that the local stack variable gets put in isn't created at the call-site, so the register-file "pressure" of the change is net neutral.)

This is no more or less complex than the current ABI. It doesn't create more exceptions or edge-cases than the current ABI. It doesn't make the ABI harder to implement. The only thing it does, is choose differently in the matter of a basically-arbitrary choice of where to put some generated glue code (the stack-to-stack copy).

The only practical upshot of this change, is that this enables compilers to sometimes do an optimization that they can't currently do, because doing said optimization would go against the rules of the amd64-SysV ABI (i.e. a caller that pushed a register instead of copying the value wouldn't be an amd64-SysV caller any more, and wouldn't be compatible with precompiled amd64-SysV callees any more; and vice-versa for the callee.)

But if-and-when a compiler does do that optimization, it's internal to the generated function. It doesn't mean that there are two potential callee "signatures" under the proposed ABI. There's only one.

Here's what the proposed ABI would probably say about stack copies:

> "The caller always passes large values by reference; the callee always receives them by reference. If the callee is taking a parameter pass-by-value, then it's up to the compiler of the callee to insert code into the callee's function prologue to turn the passed reference into a stack-local copy of the referenced data."

With that particular legalese, the callee's generated copy is still "required" by the spec, but its effects are now also "hidden" from the caller — i.e. its observable results are no longer leaking across the linkage. Therefore, the compiler is now empowered to optimize out the callee copy, as long as it can ensure the resulting code has observably equivalent results from the caller's perspective.

Note that this isn't anything the person implementing the ABI targeting code in the compiler has to worry about. They just write the code to generate a callee function prologue that does a stack-to-stack copy. It's the person writing the optimization pass that comes after that codegen step, who can now can take that stack-to-stack copy and — static proof of read-only access by the callee in hand — drop it out.

The optimization opportunity being enabled by the change, isn't part of the ABI's spec. The proposed ABI is just about moving the stack-to-stack copy into the callee. What the compiler chooses to do when targeting an ABI where the callee does stack-to-stack copies, is up to the compiler. Presumably, it will do "whatever fiendish things it can" at -O3, and "nothing much different" at -O0. Like usual.

And either way, the linkage itself looks the same. The optimization doesn't change the linkage. Any and all tooling that examines the linkage — debuggers, disassemblers, tracers, etc. — would see the same thing, whether the optimization has occurred or not. Because the optimization isn't part of the linkage; it's internal to the codegen of the callee, enabled by the (uniformly!) modified structure of the linkage.