Hacker News new | ask | show | jobs
by ergeysay 972 days ago
Actually, when I first saw this, I set out to investigate. I managed to reproduce the same performance in an isolated environment and I has been sitting on a post on how this works for quite a time. Hopefully, I will have time to finish it this weekend.

Yes, it is that fast.

Long story short, the trick is a completely novel way of implementing interpreters.

It is tree-based and I should stress that the tree is not an AST tree but a highly specialized tree-based representation produced from the AST tree. A node is basically equivalent to an "instruction" in a bytecode-based interpreter. Usual arguments against tree-based interpreters do not apply: there is no pointer-chasing, since the nodes sit tightly in the CPU cache. Dispatch overhead is just a virtual function call so should be comparable to a computed goto-based interpreter.

The nodes operate on machine-native primitive types and the values are passed around between nodes in registers most of the time - oftentimes the interpeter doesn't even need to access the main memory. There is no encoding or decoding of values and no typechecking during evaluation. The interpreter even goes at lengths to avoid conversion of values in most paths, i.e. if your code operates on uint32_ts, they will be passed around as such up until a node that requires them to be cast, effectively creating typed execution paths.

Before the interpreter there is also a "compilation" phase which performs a ton of optimizations, of which I guess the most interesting is the node fusion. As you may guess, it replaces smaller nodes with larger and more specialized ones. In my tests I implemented it for a naive AST-walker-like intepreter for a 3x performance improvement.

I can go on but I better finish the post.

I would also add that writing an interpreter in this style is far simpler than writing even a simplest JIT, but it does impose certain requirements on the language itself.

All in all, the design is nothing short of genius.

1 comments

Interesting, thanks for the info. So as it seems the fact that no type checking/conversion is necessary as in dynamic languages has a positive influence on performance; but I would expect significantly less than factor two. The special AST therefore means that the interpreter is closer to a bytecode than an AST interpreter and doesn't suffer from the drawbacks of an AST interpreter, but then assumingly is not repsonsible for the measured speed-up (but instead avoiding a speed-down). The mentioned optimizations must then be responsible for the speedup, but actually also LuaJIT has many optimizations.