| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chc4 2743 days ago
	Every single C compiler worth mentioning will turn a switch statement into a jump table. The difference between a jump table and computed goto is only that one is `jmp [ecx+eax8]` and the other is `mov edx, [ecx+eax8]; jmp edx`. The later is faster because of weird branch prediction reasons, and no bounds checks in the switch.

5 comments

wahern 2742 days ago

> Every single C compiler worth mentioning will turn a switch statement into a jump table

Only if the case values are small integers with a compact range. But if your opcodes are large and/or sparse (more common in non-VM scenarios) compilers don't optimize switch statements very well.

There's a ton of research on switch statement optimization, but most of it doesn't translate to real-world scenarios very well. Compilers will give up fairly quickly on trying to generate a jump table because the work necessary to map the case value to an index at runtime can easily cost more than its worth in many situations.

Here's some real-world code which you can easily benchmark and compare: https://github.com/wahern/hexdump/blob/master/hexdump.c#L752

Just flip the VM_FASTER macro. Using `hexdump -C < 64MB.random.file > /dev/null`, on my circa 2012 Mac Mini the switch statement is 80% slower than the computed goto and on my circa 2018 Macbook Pro it's about 15% slower. (See my note elsethread about the post-Haswell branch prediction.) And this is despite the fact that the opcode range is 0-32 without holes!

Also, with computed gotos you can remove the jump table altogether. You can make your opcodes actual addresses (just like native code) if you can make the label addresses visible to the code generator. (See my post elsethread for how to make them visible.)

P.S. While I implemented hexdump.c using a VM mostly for fun, the end result not only out performs GNU od and BSD hexdump, but IMHO the code is much easier to read, too.

link

tom_mellior 2743 days ago

> Every single C compiler worth mentioning will turn a switch statement into a jump table.

Yes, but that is not the optimization discussed here. The optimization discussed here is looking ahead to the next bytecode instruction's opcode, using it as a key into a different jump table, and jumping to that target. No C compiler I'm aware of does that.

C compilers could try to match on common patterns in interpreter implementation (a switch within a loop, keyed on certain bits of a piece of data read mostly linearly from an array) and heroically generating a dispatch table, but that would still fail in the cases of branches where you don't want to call DISPATCH because execution does not necessarily proceed to the next instruction in the array.

link

vardump 2743 days ago

HN formatting ate some * from the parent post:

  jmp [ecx+eax * 8]

  and the other is

  mov edx, [ecx+eax * 8]; 
  jmp edx

> The later is faster because of weird branch prediction reasons...

Things like that are microarchitecture dependent. Might be true on particular CPUs.

Of course it's possible separate MOV could be executed much earlier in the out of order pipeline while JMP effective address calculation might not. So JMP address (edx) would be already resolved by the time JMP is actually executing.

link

dawkins 2743 days ago

Do you know if the Go compiler does this too?

Edit: Nevermind, it doesn't: https://github.com/golang/go/issues/5496

link

ape4 2743 days ago

When you are code a `switch` you're telling the compiler what you want. So you get decades of compiler optimization working for you.

link