Hacker News new | ask | show | jobs
by naasking 1257 days ago
> My experience working on this problem led me to the same conclusion as Mike Pall, which is that compilers do not do well with this pattern

Note that that message is from twelve years ago. A lot's changed since then, not just in compilers but in CPUs. Branch prediction is a lot better now.

2 comments

Mike's primary complaint is bad register allocation. It is very important to keep the most important state consistently in registers. In my experience, compilers still struggle to do good register allocation in big and branchy functions.

Even perfect branch prediction cannot solve the problem of unnecessary spills.

Very true. I imagine that grouping instructions that use the same registers into their own functions would help with that (arithmetic expressions tend to generate sequences like this). Then you loop within this function while the next instruction is in the same group, and only return to the outer global instruction loop otherwise. If you design the bytecode carefully, you can probably do group checks with a simple bitmask.
Does providing a hint to the compiler using the register keyword address the issue sufficiently?
No, most compilers ignore the register keyword, see: https://stackoverflow.com/a/10675111
Nearly. You need register and to also pass them into (potentially no-op) inline asm. `register int v("eax")` iirc, but it's been years since I did this.

The 'register' is indeed largely ignored, but it has the additional somewhat documented meaning of 'when this variable goes into inline asm, it needs to be in that register'. In between asm blocks it can be elsewhere - stack or whatever - but it still gives the regalloc a really clear guide to work from.

It's `register int v asm("eax")`. However they are very easily elided, especially after higher optimization levels; compilers are very open about this [1].

[1] https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables....

I read a research paper that proved that the branch prediction issues are non-issue with modern predictors (eg. TTAGE). It is of course true that register spills happen, but it's not bad enough to want to write hand-written assembly. Especially when you simulate AOT-compiled code (eg. RISC-V and WASM), you will already be 3-10x faster than Lua already. For my purposes of using this kind of emulator for scripting, it is already fine.

Throw instruction counting into the mix, and you can even be faster than LuaJIT, although I'm not sure how it manages to screw up the counting so badly. I wrote a little bit about it here: https://medium.com/@fwsgonzo/time-to-first-instruction-53a04...