| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Octokiddie 1261 days ago
	Any ideas on why miniwasm performs better on all the benchmarks except "trap," on which it performs decidedly worse?

2 comments

4984 1261 days ago

The benchmarks were run on MacOS, and actually execute an interrupt for debugging, MacOS then checks if the process is being debugged. Wasm3 just exit(1) and prints a message.

And as to why the rest are faster, I spent much time optimizing the interpreter and learning what the best way to write interpreters is. Its mostly jump threading and Mixed Data.

link

titzer 1260 days ago

I found that most Wasm interpreters are not particularly good at calls. Wizard is not as fast as wasm3 or wamr in raw speed, but is much faster on calls, particularly because it does not copy arguments (value stacks can be overlapped). But Wizard's primary motivation is to be memory efficient, so it interprets in-place. It also supports instrumentation.

Nice work!

link

fwsgonzo 1261 days ago

Don't take this as anything other than speculation: I wonder if wasm3 is using musttail with opaque function calls in the instruction handlers. It will demolish performance, which is why I am only using computed gotos in mine (when available). Even switch-case is faster than musttail when you have to leave the tco-jumps. Which is (as an example) why one should not measure performance by fibonacci number generation. :)

link

haberman 1261 days ago

> I wonder if wasm3 is using musttail with opaque function calls in the instruction handlers. It will demolish performance, which is why I am only using computed gotos in mine (when available). Even switch-case is faster than musttail when you have to leave the tco-jumps.

This doesn't match with my experience. After working on this problem a lot, I came to the conclusion that musttail with opaque function calls is one of the best ways of getting good code out of the compiler: https://blog.reverberate.org/2021/04/21/musttail-efficient-i...

link

fwsgonzo 1261 days ago

I meant having an opaque function inside your instruction handler. My assembly looks like crap if something doesn't get inlined. Because I have no way of achieving this I simply cannot use TCO. It runs fibonacci faster, but anything that uses memory is way worse because it pushes and pops a ton of registers on the instruction handler itself, and not the slow-path opaque function.

An instruction handler here being a dispatch function. It handles a single instruction.

Reading your post it says so under Limitations. Opaque calls trashes performance. I guess we agree, but then again I was just reading my assembly, so I had no reason to doubt myself.

link

haberman 1260 days ago

Yes our solution was to make all fallback functions into tail calls. It solves the problem, but requires a lot of discipline and can be a bit awkward.

I recently saw this, which is a very interesting approach for using non-tail-call fallback functions without trashing the code: https://chromium-review.googlesource.com/c/v8/v8/+/4116584

link

fwsgonzo 1260 days ago

That's very interesting! Thanks!

link