Hacker News new | ask | show | jobs
by zackoverflow 917 days ago
> Note that the context pointer came after the “standard” arguments. All things being equal, “extra” arguments should go after standard ones. But don’t sweat it! In the most common calling conventions this allows stub implementations to be merely an unconditional jump.

I didn't know this. Does this optimization have a name? Where can I read more about it?

3 comments

This is because of the calling convention / ABI

If you write

    void free_with_extra_args(int *a, int *b) {
        free(a);
    }
then *a is in already in the correct slot (the RDI register) for the first argument when free_with_extra_args is being called. Whatever is put into *b is never touched. If you compile this with gcc -O2 you get

    free_with_extra_args:
        jmp     free
If you make the function call free(b) instead, you'll have to move b into the right place before calling free:

    free_with_extra_args:
            mov     rdi, rsi
            jmp     free
This is on x86-64 as summarized here https://en.wikipedia.org/wiki/X86_calling_conventions#System...

Wikipedia also has a nice summary of calling conventions on other platforms like ARM. All modern calling conventions are similar: pass the first args in registers and then use the stack as needed https://en.wikipedia.org/wiki/Calling_convention

That's only one half of it though, the other interesting part is the jmp (instead of call) to hand over control to a subroutine without pushing a new return address to the stack (since there's no code after the function call, and the calling function doesn't require its own stack frame).
Indeed. Typically you have pairs of call and ret, where call creates a stack frame and ret tears it down

     caller                           caller
       |                                ^
     (call) . . . . . . . . . . . . . (ret)
       |                                |
       V                                |
       outer_function      outer_function
              |            ^
           (call) . . . .(ret)
              |            |
              V            |
              inner_function
A jmp does not modify the stack, so when the inner function calls ret it jumps right back to caller

     caller              caller
        |                  ^
      (call) . . . . . . (ret)
        |                  |
        v                  |
        outer_function     |
              |            |
            (jmp)          |
              |            |
              V            |
              inner_function
This trick stops working as soon as outer_function needs local variables or does anything other than returning the exact return value of inner_function. In that case you need a stack frame
Since the times of K&R, most C calling conventions are defined so that you can call a function with more arguments than it expects. While the ISO C standard does not permit such calls in strictly conforming code, support for them is still ubiquitous.

[One very common use of this is that the C libraries on common desktop systems call main with three (or more!) arguments, and this works whether the programmer has declared an int main(void), an int main(int argc, char *argv), or an int main(int argc, char *argv, char *envp). I once again have to say that the third form, with an envp argument, is not sanctioned by either ISO C or POSIX, and that a C implementation could definitely use some special handling for an external-linkage function called “main” to allow the first two to be used.]

In practical terms, a function call goes along the lines of “pack the first few arguments into caller-saved registers[1] (using more or less complicated rules), then the rest on the stack, then do a call (using either the stack or the link register depending on the architecture), then after returning pop off the stack part of the arguments (the register part being assumed clobbered and not requiring any cleanup).” This kind of convention is called “caller cleanup” (in contrast to “callee cleanup” conventions—like non-vararg __stdcall on Win32—which have the callee pop the arguments off). While you could imagine other ways to permit extra arguments, it’s certainly the most common one.

That, then, implies that a C function can tail call (jump to) any function whose argument tuple is a prefix of its own. Good compilers will recognize this (and unlike you giving a function extra arguments, they are within their standard-given rights to do so!). If you want proper tail calls to functions of other types... that might be possible, but it’s much more gnarly. See for example the description of the musttail attribute in the Clang documentation[2].

[1] For each machine register, a calling convention will define which the callee has to preserve (“callee-saved”) and which the caller will have to if it cares about what’s in them (“caller-saved”).

[2] https://clang.llvm.org/docs/AttributeReference.html#musttail

I'm not sure if there's a particular name for it, but you could consider it to be a special case of a tail call optimisation https://en.wikipedia.org/wiki/Tail_call