Hacker News new | ask | show | jobs
by ericbb 3685 days ago
I was surprised to read that x64 apparently doesn't allow pushing or popping 32-bit values. I have a language that uses 32 bits as the basic unit for all values and I'm working toward x64 code generation. Should I just promote values to 64 bits and waste half the stack? Should I use mov instructions instead of push/pop? What solutions are other compiler-writers using?
2 comments

x86-64 is really oriented around integer values being 64-bits. For example, 32-bit operations will zero-extend the result to write the full 64-bit integer register. The ABI also assumes integral values are promoted to 64-bits and the stack is 64-bit aligned on calls.

That said, as long as you keep RSP aligned you can do whatever you want. Consider this code:

    extern void value(int* a, int* b, int* c);

    int main() {
      int a, b, c;
      value(&a, &b, &c);
      return a+b+c;
    }
This is how LLVM compiles it:

    subq	$24, %rsp
    leaq	20(%rsp), %rdi
    leaq	16(%rsp), %rsi
    leaq	12(%rsp), %rdx
    callq	value
    movl	16(%rsp), %eax
    addl	20(%rsp), %eax
    addl	12(%rsp), %eax
    addq	$24, %rsp
    retq
Note that the int values are allocated at 4-byte alignment, but rsp is aligned to 8-bytes. If you add an additional parameter, 'd', you'll see that the compiler still allocates 24-bytes of stack, and stores the additional parameter at 8(%rsp) (which is unused in the code above).
Almost correct.

If you want to call C code conforming to the x86-64 SYSV ABI, RSP needs to be aligned to 16 bytes when you execute the call. If the code you generate never calls alien code, 8 byte alignment is enough.

Since 8 bytes are occupied by return address pushed by the call which started your function, you need to decrease RSP by further 8, 24, 40, 56, 72, ... bytes before calling code generated by others.

Reason: having stack 16 byte aligned makes it easier to allocate aligned 16 byte stack variables and this is useful because x86 has 16 byte registers (SSE) which are most efficiently loaded/stored to aligned addresses.

However, it isn't only performance that you lose by neglecting alignment. I learned the hard way that some code generated by gcc crashes if you call it with unaligned stack.

That's why in this example LLVM allocates 24 bytes, even though 16 would be enough for 3 ints.

Another example (gcc):

  extern void bar();

  void foo() {
          bar();
  }

  0000000000000000 <foo>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   e8 00 00 00 00          callq  e <foo+0xe>
   e:   48 83 c4 08             add    $0x8,%rsp
  12:   c3                      retq
To anyone writing x86-64 compilers, I recommend finding the x86-64 SYSV ABI spec and reading it. Saves debugging time.
Ah, good point. I forgot about the return address.
> I was surprised to read that x64 apparently doesn't allow pushing or popping 32-bit values.

A simple way to circumvent this problem (I don't claim it is the best) is

  sub rsp, 4
  mov eax, [rsp]
where eax of course contains the value to push.