Hacker News new | ask | show | jobs
by isaacimagine 1629 days ago
So I was talking about threaded code with Shaw the other day, and he gave me a long explanation about the various tradeoffs being made and the resulting techniques used in MiniVM. As I understand it, MiniVM is basically threaded code, with cached opcodes. Essentially, it uses precomputed gotos:

    // classical computed goto
    goto *x[y[z]]
    // MiniVM precomputed goto
    goto *a[z]
This optimization makes MiniVM a tad faster than raw computed gotos, because it's essentially threaded code with some optimizations. He can probably expand on this; if you want to hear the full really interesting explanation yourself, he's pretty active in the Paka Discord[0].

[0]: https://discord.gg/UyvxuC5W5q

1 comments

That feels like the sort of optimisation that wins in small cases and loses in larger ones. The jump array is going to be 4 or 8 times the size of the opcode array so will put a lot more pressure on the caches as your programs get larger.
Years ago I wrote a small toy interpreter based on predereferenced computed gotos with surprisingly good speed. For very small programs without dynamic data allocation churn it could run as fast as 3 times slower than compiled code instead of the expected 10X slower. Good things happen speed-wise when all the byte code and data fits into the L1 cache. Also, branch prediction is near perfect in tight loops.
Sounds like a great test case for using a tensorflow-lite model to switch between both techniques.