Hacker News new | ask | show | jobs
by aardvark179 1629 days ago
That feels like the sort of optimisation that wins in small cases and loses in larger ones. The jump array is going to be 4 or 8 times the size of the opcode array so will put a lot more pressure on the caches as your programs get larger.
2 comments

Years ago I wrote a small toy interpreter based on predereferenced computed gotos with surprisingly good speed. For very small programs without dynamic data allocation churn it could run as fast as 3 times slower than compiled code instead of the expected 10X slower. Good things happen speed-wise when all the byte code and data fits into the L1 cache. Also, branch prediction is near perfect in tight loops.
Sounds like a great test case for using a tensorflow-lite model to switch between both techniques.