Hacker News new | ask | show | jobs
by lioeters 2259 days ago
> forth interpreters..have terrible interaction with branch prediction - they will never perform well on modern CPUs.

Interesting statement. It led me to this paper:

Branch Prediction and the Performance of Interpreters - Don’t Trust Folklore (2015)

https://hal.inria.fr/hal-01100647/document (pdf)

"..Many studies go back to when branch predictors were not very aggressive. Folklore has retained that a highly mispredicted indirect jump is one of the main reasons for the inefficiency of switch-based interpreters."

"The accuracy of branch prediction on interpreters has been dramatically improved over the three last Intel processor generations. This..has reached a level where it cannot be considered as an obstacle for performance anymore."

2 comments

There is an older paper by Anton Erl that already showing variations in performances for the same implementation technique from one generation of Pentium to another and of course between AMD and Intel.

Personally, I stopped worrying and used the most convenient implementation for my use case (portable interpreter written in C). Your set of primitives and how you code your Forth programs usually have a much larger improvement potential.

That paper refers to interpreters in the traditional sense. However threaded code is not interpreted in the same way. After every instruction is a computed jump. There are benchmarks on this in the jonesforth code you can actually run and you will observe the exact problem there.
Note that the paper does compare a switch-based dispatcher to a computed-goto version ('jump threading') - cf figure 2. The latter used to have a significant performance advantage over the former, which is apparently no longer true (cf figure 3 (a)). That of course doesn't invalidate your point.