| HN Mirror

Almost correct.

If you want to call C code conforming to the x86-64 SYSV ABI, RSP needs to be aligned to 16 bytes when you execute the call. If the code you generate never calls alien code, 8 byte alignment is enough.

Since 8 bytes are occupied by return address pushed by the call which started your function, you need to decrease RSP by further 8, 24, 40, 56, 72, ... bytes before calling code generated by others.

Reason: having stack 16 byte aligned makes it easier to allocate aligned 16 byte stack variables and this is useful because x86 has 16 byte registers (SSE) which are most efficiently loaded/stored to aligned addresses.

However, it isn't only performance that you lose by neglecting alignment. I learned the hard way that some code generated by gcc crashes if you call it with unaligned stack.

That's why in this example LLVM allocates 24 bytes, even though 16 would be enough for 3 ints.

Another example (gcc):

  extern void bar();

  void foo() {
          bar();
  }

  0000000000000000 <foo>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   e8 00 00 00 00          callq  e <foo+0xe>
   e:   48 83 c4 08             add    $0x8,%rsp
  12:   c3                      retq

To anyone writing x86-64 compilers, I recommend finding the x86-64 SYSV ABI spec and reading it. Saves debugging time.