|
|
|
|
|
by nelhage
459 days ago
|
|
(author here) > The problem is 95% about laying out the instruction dispatching code for the branch predictor to work optimally. A fun fact I learned while writing this post is that that's no longer true! Modern branch predictors can pretty much accurately predict through a single indirect jump, if the run is long enough and the interpreted code itself has stable behavior! Here's a paper that studied this (for both real hardware and a certain simulated branch predictor): https://inria.hal.science/hal-01100647/document My experiments on this project anecdotally agree; they didn't make it into the post but I also explored a few of the interpreters through hardware CPU counters and `perf stat`, and branch misprediction never showed up as a dominant factor. |
|
Conclusion: the rise of popular interpreter-based languages lead to CPUs with smarter branch predictors.
What's interesting is that a token threaded interpreter dominated my benchmark (https://github.com/vkazanov/bytecode-interpreters-post/blob/...).
This trick is meant to simplify dispatching logic and also spread branches in the code a bit.