Or if the OS doesn't support W+X allocation at all, then you can have a bunch of tightly packed pregenerated trampolines in the binary.
clo_code: 4C8B1501100000 mov r10 [rel clo_code+0x1008] FF25F30F0000 jmp [rel clo_code+0x1000] 0F1F00 nop3 # one page away... struct clo_slot { void (*func)(void* _R10,...); void* data; };
Or if the OS doesn't support W+X allocation at all, then you can have a bunch of tightly packed pregenerated trampolines in the binary.