Hacker News new | ask | show | jobs
by sifoobar 2743 days ago
For the longest time, I swore by computed goto as well. But it has its share of problems; it's not very portable; it forces the code into a rigid, non-extendable format; and it's not as efficient as commonly assumed. I'm far from the first person to notice [0], so don't bother shooting the messenger.

My latest project [1] simply calls into a struct via a function pointer for each instruction. With a twist. Since it returns the pointer to the next operation and uses longjmp to stop the program, which means I can get away without lookups and without a loop condition. And as an added bonus the code is much nicer to deal with and more extendable, since the set of operations isn't hardcoded any more.

Comparing language performance is tricky business, but it mostly runs faster than Python3 from my tests so far. My point is that computed goto is not the end of the story.

I'll just add that there are many different ways to write a VM; the one posted here is very low level, the one linked below reasonably high level. Using physical CPUs as inspiration is all good, but there's nothing gained from pretending in itself unless you're compiling to native code at some point.

[0] http://www.emulators.com/docs/nx25_nostradamus.htm

[1] https://gitlab.com/sifoo/snigl

3 comments

> so don't bother shooting the messenger

OK, let's shoot the original source: It only found that a certain unnamed version of GCC ten years ago performed tail merging, which does indeed defeat the purpose of the optimization. Without the author bothering to turn off tail merging in this case (as Python does), this doesn't mean much.

> Comparing language performance is tricky business, but it mostly runs faster than Python3 from my tests so far.

Presumably you not only have a different interpretation approach, but also a different object model, approach to boxing numbers, and garbage collector. "Tricky business" is a bit of an understatement ;-)

Better him than me. I wouldn't be so quick to dismiss relevant coding experience, it doesn't get old. Much of his reasoning about locality still holds, to an even greater degree today.

I've implemented more or less the same interpreter using computed goto and various variations on dispatch loops; and this is the fastest solution I've managed to come up with, and the nicest code to work with. Still too many parameters, my point is that I have plenty of experience pointing in that direction which is better than nothing. It's not a mystery to me. Using computed goto will involve some kind of lookup to get the relevant labels, I'm using pointers to actual instructions as jump targets. And since I'm longjmp'ing out, which means I don't need a loop condition; the loop is reduced to a single regular goto.

Like I said, better than nothing. There is no such thing as perfection in this world. I've spent quite some time [0] micro benchmarking different features to get a fair comparison.

[0] https://gitlab.com/sifoo/snigl/tree/master/bench

Oh, I wasn't dismissing experience in general, nor was I questioning your experiences. I would love to see your numbers on the different dispatch methods you tried.

I was only questioning that one blog post you linked that lazily said (paraphrasing) "computed-goto dispatch will be undone by the compiler, so don't bother". Others have posted numbers in this thread (https://news.ycombinator.com/item?id=18679477) showing that this information is at best outdated.

No harm done; I don't do pride any more, it kept getting in the way of truth. But there's a clear tendency, especially on HN if you ask me; to blindly trust research by whatever authority and beat people over the head with so called best practices; while completely disregarding personal experience.

Sometimes the reason no one finds a better solution for a long time is that no one believes it's possible. Let's say it's 50/50, believing it's possible is still the superior alternative.

I have no idea what Python uses for regular dispatch; the linked thread just says that their computed goto solution is faster, which comes as no surprise given that was the reason they switched.

I posted the link since I learnt a lot from it, and since it echoes my own experience.

The way I ended up structuring my opcodes was similar to your way but a little different still. I ended up using function pointers but an array of them. Each opcode is defined in an enum and will be assigned to each funtion pointer based on it's corresponding array index. When the virtual CPU reads an opcode it just calls the function pointed to by the pointer at that index.

I found doing it this way makes it easy to change or add opcodes without needing to go back through a giant switch statement to find and rearrange everything. As a bonus too, the assembler just uses the same enum and basically just works through it the opposite way the CPU does. Translating keywords to the matching opcode index in the array then writing the index to the correct memory address in the binary. This also means any time I update an opcode or add one in the virtual machine all I have to do is add it to the enum and both the assembler and virtual machine will be updated without having to do anything else.

Nice to hear! That's what I try to tell people who are asking for the right type theory book or the right dispatch method or whatever before even having a look. It's the same old problem solving; and here are always multiple solutions, each with it's own set of compromises. Claiming the one true way of doing anything related to software is delusional. We're barely scratching the surface.

Edit: While we're here; may I ask how you break out of the dispatch loop? Do you test an end-condition on each iteration? If that's the case, the longjmp trick might be worth trying. The condition will mostly be false, so it might make sense to pay more when it happens instead. You may find the essence of it in sgl_run at the bottom of sgl.c [0].

[0] https://gitlab.com/sifoo/snigl/blob/master/src/snigl/sgl.c#L...

> Edit: While we're here; may I ask how you break out of the dispatch loop? Do you test an end-condition on each iteration? ...

For what it is worth my JITter uses mprotect from other thread to change page permissions to generate an exception to break out of JITted code. Same would work for dispatcher as well.

The advantage is it's completely free performance wise while the code is running. The need for signal handler is less awesome.

> Since it returns the pointer to the next operation and uses longjmp to stop the program, which means I can get away without lookups and without a loop condition.

That threaded approach would be even faster using computed gotos. The loop conditional is a huge performance killer. Removing it claws back cycles from using function pointers, but it would benefit the computed goto approach at least much.

Personally I find using function pointers make for difficult to read code. Dispatching opcodes with a compact switch statement (or equivalent computed goto construction) makes it easier for me to see the meat of the engine and how it works. If the logic for an opcode is complex it can be easily pushed into a static function (which may or not be inlined, but at such a juncture readability is the primary concern). But for simple operations I prefer it inline, textually, so it's easier to see all the moving parts at once.