Hacker News new | ask | show | jobs
by daniellehmann 2847 days ago
Thanks for the warm words, happy to hear that others are interested in this project and WebAssembly in general!

Regarding your question: No, I have not yet contacted the WebAssembly people at Mozilla. But it's definitely a good idea to talk to the implementation experts. Before I do that, I just wanted to collect more "concrete" questions/problems to ask about.

One of those questions is about the performance overhead of the WebAssembly <-> JavaScript interop. In Wasabi, we have a lot of this, because the "analysis hooks" are written in JavaScript and we insert roughly one hook call per original instruction into the wasm binary. Even without any analysis code, just adding these calls can have a runtime overhead >30x. I would like to optimize this, but before that I need to find our where the overhead is coming from. Possible reasons are (just guesswork, input from people working on this is greatly appreciated):

- that many calls are just inherently expensive, be it cross-language or not (possible solution: be more selective about when to insert calls to analysis hooks) - Wasm <-> JavaScript calls are more expensive than Wasm <-> Wasm ones (possible solution: compile analyses to Wasm, or: hope that this gets optimized better by engines in the future) - the added instructions inhibit some wasm compiler optimization(s) (e.g., inlining is no longer performed because the function bodies are larger than some threshold) - ...

So far, I found working with WebAssembly very pleasing. The spec is compact but still easy to follow. I wrote my own de-/encoder and "high-level representation" of the binary format, which was straightforward and is abled to roundtrip all test files from the spec repo. The most surprising bit was about validation of dead code (i.e., code after an unconditional br is type checked, but the br is assumed to "produce any possible value").

As for personal communication: I am happy to answer any in-depth question via email or so (see http://software-lab.org/people/Daniel_Lehmann.html).

1 comments

> Even without any analysis code, just adding these calls can have a runtime overhead >30x. I would like to optimize this, but before that I need to find our where the overhead is coming from.

That thought entered my head immediately as soon as I noticed you were instrumenting every instruction.

> - that many calls are just inherently expensive, be it cross-language or not (possible solution: be more selective about when to insert calls to analysis hooks)

This is definitely true, and hard to get around. The wasm instructions will be compiled to machine instructions, and the calls will still be calls, and calls are expensive.

One possible approach to mitigate this cost might be to collect and batch calls into the hook functions. Basically your instrumentation would be a trace-dump of execution and data to some in-wasm memory, and periodically you call out to JS for analysis once the buffer fills up.

This should reduce the call overhead and replace it with a single write to a well known location.

Now, if your analysis functions expect to be able to peek at memory and get a consistent view of memory at the time of the instruction being analyzed, you'll need to do some special magic to re-compute the memory state at that time from the recorded trace, but that can be done on-demand when analysis requires, so 0 cost if the hooks are not present.

Please note that I'm not sure how well this would work exactly, but it seems promising.

> - Wasm <-> JavaScript calls are more expensive than Wasm <-> Wasm ones (possible solution: compile analyses to Wasm, or: hope that this gets optimized better by engines in the future)

It's getting optimized now. My impression is that the big cost here is marshalling wasm numbers into JS values. I don't know of a good way to avoid this aside from not calling into JS when you can avoid it (i.e. you know there are no analysis hooks attached to something).

I wonder if a simple runtime flag check within wasm, guarding the call-out, would significantly reduce the overhead cost.

> - the added instructions inhibit some wasm compiler optimization(s) (e.g., inlining is no longer performed because the function bodies are larger than some threshold)

This shouldn't be the case too much. Most of the heavyweight compiler opts happen before emission to wasm, including a good chunk of inlining. I'm not even sure if Odinmonkey (our Wasm impl) does any extra inlining on top of that - it might just expect the compiler to take care of that.

I'll get in touch. I think you'd get more confident answers on these from the direct WASM crowd. My answers are a bit speculative, and lack concrete details about the latest implementation status.