Hacker News new | ask | show | jobs
by tbodt 2945 days ago
What's the advantage over using LLVM's built-in JIT, or PyPy's JIT, or generating machine code directly, or anything else that doesn't have the overhead of spawning processes for compiling and linking? One of the goals listed is minimizing the JIT compilation time.
5 comments

LLVM clearly wasn't designed for JIT. Don't let those letters "VM" confuse you; it's more like a machine abstraction than a virtual machine. And even that is far from water-tight.

But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.

Can anyone think of any reasonably common stacks using LLVM as a JIT? There's mono, but that's a non-default mode; not sure if it's typically used. The python unladen-swallow experiment failed. Webkit had a short-lived FLT javascript optimization pass, but that was replaced by B3.

Which is just a long-winded way to suggest that LLVM is not likely to be ideal as a JIT, at least based on what past projects have done.

(Not trying to imply that writing C to disk is better, but it may well be simpler & more flexible - not worthless qualities for an initial implementation).

I use LLVM as a JIT via Terra [1]. It performs about as well as you'd expect any other C compiler to perform. That is, if you do a bad job of code generation and pass it a multi-MB file in a single function, well then of course it's going to choke. But if you're optimizing tight loops and have reasonable code generation, it's very good and you can get performance comparable to a best-in-class C compiler without the overhead and headache associated with calling out to an external program.

The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].

But if the alternative is shelling out to a C compiler? I'll take LLVM any day. The issue is not just the overhead of a call to an external program, it's all the extra complexity that comes along with that. It is very, very easy for this approach to break, especially when you consider the breadth of C compilers that exist, and all the possible ways they can be configured. In contrast, LLVM is "just" a library that you link to.

[1]: http://terralang.org/

[2]: http://klee.llvm.org/

I'm a little skeptical about the costs complexity of an external program. You may not need to support all those C compilers, but at least you have the choice. And C is extremely mature and stable. If you're generating code, you probably don't need to use the latest not-so-well supported features; you may well be able to have C code that compiles on almost any compiler from the last 3 decades without too much trouble. And while there will be more configuration choices, it's not like raw LLVM has none.

If anything, I'd bet plain C is much simpler because it hasn't changed much, and is very unlikely to ever to anything very suprising on any future platform - which cannot be said of raw LLVM.

And of course shelling out is a a bit of a hassle, but hey; it's a well-trodden path on unix. It's not the fastest, greatest interop in the world, but it's good enough for a lot of things.

(and wow- terra sounds impressive!)

I agree with many of your points, in theory.

I'll just say that my views come mainly from experience, specifically ECL (Embeddable Common Lisp, a CL implementation) and (this was further back, so my memory is fuzzy) a tool for generating executables from Perl scripts. I don't think I'm using an especially unusual setup, or unusual compilers, and I would guess that these tools probably target a very narrow subset of C. Despite this, my experience with these sorts of tools has been anything but "works out of the box". On the contrary, there appear to be a great number of degrees of freedom, even with standard-ish setups, that can trip up these tools. Because of the additional layers of abstraction, the error messages you get are very poor. Some header file is missing or in an unexpected place, or worse some generated code fails to compile. As an end-user, it's basically impossible to debug these in a reasonable way.

You can certainly have internal errors using LLVM, but in my experience fewer of them are platform-dependent. Therefore there is a greater chance that something that works for the developer will work for the user. Also, if error handling is done properly, if a failure does occur it can often mapped back to the original source program. This is much better as far as usability goes, since the user almost never wants to debug some compiler's generated code.

> The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].

Yea, it's annoying. For PostgreSQL I've decided to focus on the C API wherever possible exactly out of that reason. A bit more painful to write, but not even remotely as quickly moving. Obviously there's parts where that's not possible - but even there I've decided to localize that as much as possible.

I wonder if there will ever be a de facto API wrapper for LLVM. As it is, I'm aware of smaller efforts here and there, but other than SPIR-V [1] I'm not sure any are big enough to have long-term survivability potential. And even with SPIR-V I'm not sure if the momentum is really there or not.

[1]: https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.pdf

I think Apple's bitcode for iOS deployment is also more stable than the actual LLVM bitcode.
> Can anyone think of any reasonably common stacks using LLVM as a JIT?

We just added LLVM based JIT to PostgreSQL. Don't think we have quite the same issues as JITing generic interpreted languages though, because the planner gives us much more information about the likely cost of executing a query. So the need for a super-fast baseline JIT isn't as big.

> But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.

I think that's partially due to people using the expensive default pipeline when using optimization. A lot of those either don't make sense for the source language, or not for the first JIT foreground JIT pass.

The biggest issue I have with LLVM wrt around JITing is that it's error handling isn't really good enough. It's fine to just fatal error if you're in a AOT compiler world, but that's much less acceptable inside a database. There's moves to make at least parts of LLVM exception safe, but ...

> Can anyone think of any reasonably common stacks using LLVM as a JIT?

PostgreSQL - although i doubt that's the sort of thing you had in mind!

That's a great example! It's pretty much exactly what I was looking for (well, except that it's probably going to be niche, at least for a while?) Still - good example.
Having used LLVM for precisely that for both Open Shading Language (OSL) and my startup’s runtime, LLVM’s JIT was pretty good. It’s certainly not optimized for “just throw everything at it function at a time and pray” like a more custom-built JIT (like in HHVM) or even nanojit. But it’s backend output is beyond compare, and you instantly get cross-platform compatibility. As a runtime implementer, it (was) phenomenal.

After LLVM 3.4 or so with the forcible move to “MCJIT” (now ORCJIT maybe?) it suddenly got even more painful though. While the Module system in LLVM was always abused by the JIT, it was a sad day for many of us who instead pinned to 3.4 for a while. I haven’t followed up in a while to see how the newer JITs have progressed, but I believe the last-layer JIT for Safari uses LLVM as well.

tl;dr: for the right time versus execution speed trade-off, LLVM is still awesome.

Safari's last-layer used LLVM until 2016; it's switched to a custom JIT since then (ref: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...).

Since you have some experience - do you think shelling out would have been much more painful?

See, I’m out of date! :).

Shelling out (which I’ve also done) is okay, but you never get to really teach the backend what you know. That is, no matter how hard you try, you can’t teach gcc, icc, or clang that you know it’s safe to just fetch this function pointer off a struct and that it’s stable. Writing a simple pass in LLVM though is incredibly straightforward. You can even do a simple inliner, that knows how to inline just the runtime callsites you care about.

Like the WebKit folks and the HHVM folks before them: dynamic languages have enough complexity that you often get most of the win from a “basic compilation” (compared to say C/C++) so after you’ve proven out what you need, you roll your own.

Shelling out though would be strictly worse than the LLVM in-memory approach, since it gets you no additional benefit (in some ways it’s harder, since you can’t just say “jump to this address”), you lose a lot of upside (custom passes, letting you tune optimizations and instruction selection beyond simply -O0, -O1, etc.), and then you get to require users to have a compiler on their box.

I’d personally look at nanojit or the other JIT libraries before shelling out to a regular compiler.

yeah. even LLVM is very slow for a JIT. Ask any project using LLVM as a backend for a JIT and they'll tell you it's an recurring issue. See for example https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...

I know very little about ruby specifically but IME for this kind of dynamic language you get most of the initial gains by :

- removing (by analysis or speculation) dynamic dispatch

- unboxing / avoiding allocations in the easy cases

Once you've done that, you can generate pretty dumb assembly and still come out way ahead of your interpreter (and avoid very costly optimization / instruction selection / regalloc / scheduling).

Most of what llvm / gcc do only make sense when you've got your code down close to whatever you would actually write in C.

They want to check the generated code and let people check it

> The main purpose of this JIT release is to provide a chance to check if it works for your platform and to find out security risks before the 2.6 release

LLVM's JIT is in a sense different from that of, say, PyPy. It's more primitive. When people talk about JIT in the context of LLVM, they mean the set of APIs provided by LLVM's library. That is, give it a set of IR functions, and things they depend on, the library dynamically compiles and links them for you. More concretely, for example, given the IR of a function, it gives you a raw pointer to the compiled version that you can call directly. It takes care of the boring (and often platform dependent) parts efficiently -- code gen, linking, etc -- so that you can focus on generating efficient IR (which is the hard part for a JIT).
Ruby on PyPy already exists: https://github.com/topazproject/topaz

Performance is disappointing, though.