Hacker News new | ask | show | jobs
by mananaysiempre 1011 days ago
Interesting that both of the points you state completely contradict my experience with LuaJIT.

Parsing has always been one of the things its tracing JIT struggled with; it is still faster than the (already fairly fast) interpreter, but in this kind of branch- and allocation-heavy code it gets nowhere near the famed 1.25x to 1.5x of GCC (or so) that you can get by carefully tailoring inner-loopy code.

(But a tracing JIT like LuaJIT is a different from a BBV JIT like YJIT, even if I haven’t yet grokked the latter.)

LuaJIT’s FFI calls, on the other hand, are very very fast. They are still slower than not going through the boundary at all, naturally, but that’s about it. On the other hand, going through the Lua/C API inherited from the original, interpreted implementation—which sounds similar to what the Ruby blog post is comparing pure-Ruby code to—can be quite slow.

The SWC situation I can’t understand quickly, but apart from the WASM overhead it sounds to me like they have a syntax tree that the JS plugin side really wants to be GCed in the GC’s memory but the Rust-on-WASM host side really wants to be refcounted in WASM memory, and that is indeed not a good situation to be in. It took a decade or more for DOM manipulation in JS to not suck, and there the native-code side was operating with deep (and unsafe) hooks into the VM and GC infrastructure as opposed to the WASM straitjacket. Hopefully it’ll become easier when the WASM GC proposal finally materializes and people figure out how to make Rust target it.

In any case, it annoys me how hard it is in just about any low-level language to cheaply integrate with a GC. Getting a stack map out of a compiler in order to know where the references to GC-land are and when they are alive is like pulling teeth. I don’t think it should be that way.

1 comments

There is a very big difference between a simple FFI system and the sort of C interface offered by Ruby and Node. Those interfaces allow objects to be passed to the native code, and the native can then do pretty much anything to the language run state. This is great if you want a C library that can do anything your higher level language could do, but it also means the JIT has to treat all those calls as impenetrable barriers that cannot be optimised through, so even a small C call can prevent the rest of your application from being optimised.

We got round this in TruffleRuby by running C extensions through an LLVM Bitcode interpreter that was part of the same framework as the Ruby interpreter and allowed them to be JITted together, but that had other downsides, and wasn’t great for things like parsers which had huge switch statements.

Yes but in this case the TruffleRuby approach would fix the Shopify issue, I think? And if by downside you mean longer warmup times that's an issue for YJIT or any other JIT too, so how much of a downside it is depends a lot on the nature of the deployment.
Shopify bigger repos are deployed pretty much every 30 minutes. As you point out most JITs struggle in these conditions.

But YJIT warms up extremely fast, and is able to provide real world speedup to these services almost immediately.