| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tomp 1994 days ago

Clang will very aggressively optimize with -O2 (or whatever the max speed optimization is). It think gcc and MSVC as well.

I was experimenting with a simple indirect-threaded interpreter, which I wanted to write in a "nice" style (1 function per opcode) but have compiled in an "optimized" way (state machine). Playing around in Godbolt, I found out that the compilers managed to detect & optimize even indirect tail calls, i.e. tail calls to function pointers. This was indeed on a toy example (i.e. like 5 functions, not a real interpreter with 100 different opcodes) but still truly amazing.

The reason for this was, I was trying to replicate LuaJIT2 - Mike Pall was complaining that the reason interpreters written in C are slow is, that the compiler cannot optimize & register-allocate a large function containing a switch statement for 100 opcodes. Instead, he wrote the interpreter in asm, and made sure all important variables are kept in registers. My thesis was that we could do the same thing by having a bunch of functions tail-calling each other, with all important variables as parameters (which would be optimized into registers). Better, actually, because the compiler could improve register allocation within each "opcode"/function.