| > That code has a lot of branching. The switch statement has to jump to the corresponding case, the break statement branches to the bottom, and then there is third branch to get back to the top of the while loop. Three branches just to hit one instruction. That's a bit unfair. Not all branches are equal. Only the instruction fetch branch is going to be often mispredicted. Predicted branches, like that while loop, aren't that expensive. Mispredicted branches cost 10-20x more. Of course less branches and less code in general is better. One big issue writing interpreters in C/C++ is that compiler register allocation can't usually follow the data flow, and needs to keep unnecessarily loading and storing from/to memory same common variables. Interpreters need to also be careful not to exceed 32 kB L1 code cache limits. All this means to write a truly efficient interpreter, you'll need to do it in assembler. The step after that is to write a simple JIT that does away with data dependent (= VM instruction) branches altogether. Then you'll notice you don't need to update some VM registers every time, but can coalesce for example program counter updates to certain points. Eventually you'll find you have a full fledged JIT compiler doing instruction scheduling and register allocation, etc. Been down that rabbit hole, except for the last step. That's where it becomes a true challenge. LuaJIT (http://luajit.org/) project followed all the way through, and studying it is a great resource for anyone interested on the topic. Kudos to Mike Pall. |