Hacker News new | ask | show | jobs
by blattimwind 2945 days ago
> Unlike ordinary JIT compilers for other languages, Ruby’s JIT compiler does JIT compilation in a unique way, which prints C code to a disk and spawns common C compiler process to generate native cod e.

Oh dear god.

7 comments

What's the advantage over using LLVM's built-in JIT, or PyPy's JIT, or generating machine code directly, or anything else that doesn't have the overhead of spawning processes for compiling and linking? One of the goals listed is minimizing the JIT compilation time.
LLVM clearly wasn't designed for JIT. Don't let those letters "VM" confuse you; it's more like a machine abstraction than a virtual machine. And even that is far from water-tight.

But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.

Can anyone think of any reasonably common stacks using LLVM as a JIT? There's mono, but that's a non-default mode; not sure if it's typically used. The python unladen-swallow experiment failed. Webkit had a short-lived FLT javascript optimization pass, but that was replaced by B3.

Which is just a long-winded way to suggest that LLVM is not likely to be ideal as a JIT, at least based on what past projects have done.

(Not trying to imply that writing C to disk is better, but it may well be simpler & more flexible - not worthless qualities for an initial implementation).

I use LLVM as a JIT via Terra [1]. It performs about as well as you'd expect any other C compiler to perform. That is, if you do a bad job of code generation and pass it a multi-MB file in a single function, well then of course it's going to choke. But if you're optimizing tight loops and have reasonable code generation, it's very good and you can get performance comparable to a best-in-class C compiler without the overhead and headache associated with calling out to an external program.

The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].

But if the alternative is shelling out to a C compiler? I'll take LLVM any day. The issue is not just the overhead of a call to an external program, it's all the extra complexity that comes along with that. It is very, very easy for this approach to break, especially when you consider the breadth of C compilers that exist, and all the possible ways they can be configured. In contrast, LLVM is "just" a library that you link to.

[1]: http://terralang.org/

[2]: http://klee.llvm.org/

I'm a little skeptical about the costs complexity of an external program. You may not need to support all those C compilers, but at least you have the choice. And C is extremely mature and stable. If you're generating code, you probably don't need to use the latest not-so-well supported features; you may well be able to have C code that compiles on almost any compiler from the last 3 decades without too much trouble. And while there will be more configuration choices, it's not like raw LLVM has none.

If anything, I'd bet plain C is much simpler because it hasn't changed much, and is very unlikely to ever to anything very suprising on any future platform - which cannot be said of raw LLVM.

And of course shelling out is a a bit of a hassle, but hey; it's a well-trodden path on unix. It's not the fastest, greatest interop in the world, but it's good enough for a lot of things.

(and wow- terra sounds impressive!)

I agree with many of your points, in theory.

I'll just say that my views come mainly from experience, specifically ECL (Embeddable Common Lisp, a CL implementation) and (this was further back, so my memory is fuzzy) a tool for generating executables from Perl scripts. I don't think I'm using an especially unusual setup, or unusual compilers, and I would guess that these tools probably target a very narrow subset of C. Despite this, my experience with these sorts of tools has been anything but "works out of the box". On the contrary, there appear to be a great number of degrees of freedom, even with standard-ish setups, that can trip up these tools. Because of the additional layers of abstraction, the error messages you get are very poor. Some header file is missing or in an unexpected place, or worse some generated code fails to compile. As an end-user, it's basically impossible to debug these in a reasonable way.

You can certainly have internal errors using LLVM, but in my experience fewer of them are platform-dependent. Therefore there is a greater chance that something that works for the developer will work for the user. Also, if error handling is done properly, if a failure does occur it can often mapped back to the original source program. This is much better as far as usability goes, since the user almost never wants to debug some compiler's generated code.

> The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].

Yea, it's annoying. For PostgreSQL I've decided to focus on the C API wherever possible exactly out of that reason. A bit more painful to write, but not even remotely as quickly moving. Obviously there's parts where that's not possible - but even there I've decided to localize that as much as possible.

I wonder if there will ever be a de facto API wrapper for LLVM. As it is, I'm aware of smaller efforts here and there, but other than SPIR-V [1] I'm not sure any are big enough to have long-term survivability potential. And even with SPIR-V I'm not sure if the momentum is really there or not.

[1]: https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.pdf

I think Apple's bitcode for iOS deployment is also more stable than the actual LLVM bitcode.
> Can anyone think of any reasonably common stacks using LLVM as a JIT?

We just added LLVM based JIT to PostgreSQL. Don't think we have quite the same issues as JITing generic interpreted languages though, because the planner gives us much more information about the likely cost of executing a query. So the need for a super-fast baseline JIT isn't as big.

> But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.

I think that's partially due to people using the expensive default pipeline when using optimization. A lot of those either don't make sense for the source language, or not for the first JIT foreground JIT pass.

The biggest issue I have with LLVM wrt around JITing is that it's error handling isn't really good enough. It's fine to just fatal error if you're in a AOT compiler world, but that's much less acceptable inside a database. There's moves to make at least parts of LLVM exception safe, but ...

> Can anyone think of any reasonably common stacks using LLVM as a JIT?

PostgreSQL - although i doubt that's the sort of thing you had in mind!

That's a great example! It's pretty much exactly what I was looking for (well, except that it's probably going to be niche, at least for a while?) Still - good example.
Having used LLVM for precisely that for both Open Shading Language (OSL) and my startup’s runtime, LLVM’s JIT was pretty good. It’s certainly not optimized for “just throw everything at it function at a time and pray” like a more custom-built JIT (like in HHVM) or even nanojit. But it’s backend output is beyond compare, and you instantly get cross-platform compatibility. As a runtime implementer, it (was) phenomenal.

After LLVM 3.4 or so with the forcible move to “MCJIT” (now ORCJIT maybe?) it suddenly got even more painful though. While the Module system in LLVM was always abused by the JIT, it was a sad day for many of us who instead pinned to 3.4 for a while. I haven’t followed up in a while to see how the newer JITs have progressed, but I believe the last-layer JIT for Safari uses LLVM as well.

tl;dr: for the right time versus execution speed trade-off, LLVM is still awesome.

Safari's last-layer used LLVM until 2016; it's switched to a custom JIT since then (ref: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...).

Since you have some experience - do you think shelling out would have been much more painful?

See, I’m out of date! :).

Shelling out (which I’ve also done) is okay, but you never get to really teach the backend what you know. That is, no matter how hard you try, you can’t teach gcc, icc, or clang that you know it’s safe to just fetch this function pointer off a struct and that it’s stable. Writing a simple pass in LLVM though is incredibly straightforward. You can even do a simple inliner, that knows how to inline just the runtime callsites you care about.

Like the WebKit folks and the HHVM folks before them: dynamic languages have enough complexity that you often get most of the win from a “basic compilation” (compared to say C/C++) so after you’ve proven out what you need, you roll your own.

Shelling out though would be strictly worse than the LLVM in-memory approach, since it gets you no additional benefit (in some ways it’s harder, since you can’t just say “jump to this address”), you lose a lot of upside (custom passes, letting you tune optimizations and instruction selection beyond simply -O0, -O1, etc.), and then you get to require users to have a compiler on their box.

I’d personally look at nanojit or the other JIT libraries before shelling out to a regular compiler.

yeah. even LLVM is very slow for a JIT. Ask any project using LLVM as a backend for a JIT and they'll tell you it's an recurring issue. See for example https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...

I know very little about ruby specifically but IME for this kind of dynamic language you get most of the initial gains by :

- removing (by analysis or speculation) dynamic dispatch

- unboxing / avoiding allocations in the easy cases

Once you've done that, you can generate pretty dumb assembly and still come out way ahead of your interpreter (and avoid very costly optimization / instruction selection / regalloc / scheduling).

Most of what llvm / gcc do only make sense when you've got your code down close to whatever you would actually write in C.

They want to check the generated code and let people check it

> The main purpose of this JIT release is to provide a chance to check if it works for your platform and to find out security risks before the 2.6 release

LLVM's JIT is in a sense different from that of, say, PyPy. It's more primitive. When people talk about JIT in the context of LLVM, they mean the set of APIs provided by LLVM's library. That is, give it a set of IR functions, and things they depend on, the library dynamically compiles and links them for you. More concretely, for example, given the IR of a function, it gives you a raw pointer to the compiled version that you can call directly. It takes care of the boring (and often platform dependent) parts efficiently -- code gen, linking, etc -- so that you can focus on generating efficient IR (which is the hard part for a JIT).
Ruby on PyPy already exists: https://github.com/topazproject/topaz

Performance is disappointing, though.

I don't see why this is bad. Many compilers generate C
Not at runtime.
Varnish uses GCC to compile VCL into a dynamically loaded library. Not a general-purpose language, but it's done at runtime.
MemSQL and Smalltalk generate C at runtime.
If you mean Squeak, it surely does not.

Generating C is part of the bootstrapping process, it isn't used at runtime, the JIT generates the usual machine code directly.

Which Smalltalk?
Smalltalk/X can fileout packages as C projects that are then compiled by C compiler. But AFAIK this was never meant to be used as JIT and is primarily an deployment mechanism and non-ancient versions use in-process code generator implemented in Smalltalk as JIT backend.

There are Common Lisp implementations that support similar mechanism of generating C code (ECL, Kyoto CL...), but I don't think any of then compiles C into .so which then gets dlopened right away as poor-mans JIT.

KCL generates .c files and compiles those to .o object files. I played with this year ago (via the descendant GCL: GNU Common Lisp). The load function handles object files, like COFF or whatever. It's reminiscent of the Linux kernel modules.

See here, starting on P. 36: http://www.softwarepreservation.org/projects/LISP/kcl/doc/kc...

When KCL compiles a lambda expression, it generates a C file called "gazonk.lsp" and compiles that.

(The above paper report is a little confusing; in some places it claims that an object file has a .o suffix, but then with regard to this gazonk implicit name, it claims that the fasl file is gazonk.fasl.)

Example with GCL: compile individual function to C, compile it with C to a .o (for example on my 32bit ARM it is a elf32-littlearm file) file and then load it:

    >(defun foo (a) (* a 42)) 

    FOO

    >(compile 'foo)

    Compiling /tmp/gazonk_24158_0.lsp.
    End of Pass 1.  
    End of Pass 2.  
    OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3
    Finished compiling /tmp/gazonk_24158_0.lsp.
    Loading /tmp/gazonk_24158_0.o
    start address -T 0x888488 Finished loading /tmp/gazonk_24158_0.o
    #<compiled-function FOO>
    NIL
    NIL
It's the traditional method of doing a job equivalent to JIT. It has several examples in the history of computing, like SpamAssassin and Matlab (the latter if my memory serves me correctly).
Same as the last stage of the SQL Server query optimizer then. (hey MS)
> Oh dear god.

Care to elaborate?

Overhead. It converts its IR to C; dumps that to disk; the C compiler loads the code back from the disk; the frontend parses the code (if not done carefully maybe CPP is also invoked); the compiler dumps the generated code to disk again; and then presumably dlopen loads the code back from disk again. There's also the overhead of spawning a separate compiler process. A better way would be to directly generating code to memory and link it. This is of course trickier, but is also what libraries such as LLVM's JIT infrastructure and libjit are built for. If you need more performance (i.e. LLVM's JIT is too slow for you), you roll your own infrastructure to do this -- which is what JVM and V8 do.
There's definitely a run-time overhead, but it may not be that bad in practice. Details on the Ruby JIT implementation are here: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#mjit-...

They don't "dump to disk", if you mean an actual storage device. By default they store data to a "file system in memory" (a tmpfs), so it never gets written to a long-term storage device (not even an SSD). Even if you do "dump to disk", on a modern OS storing things in a file just puts it in memory and schedules it for eventual long-term storage. Of course, doing things this way has overheads, but it may not be so bad.

The C frontend has to parse things, of course, but it looks like they're heavily optimizing this. "To simplify JIT implementation the environment (C code header needed to C code generated by MJIT) is just an vm.c file. A special Ruby script minimize the environment (Removing about 90% of the declarations). One worker prepares a precompiled code of the minimized header, which starts at the MRI execution start".

Their current results are that "No Ruby program real time execution slow down because of MJIT" and "The compilation of small ISEQ takes about 50-70 ms on modern x86-64 CPUs". You're of course using more CPU (to do the compilations in parallel), and you have to have a compilation suite available at runtime, but in many circumstances that is perfectly reasonable.

IIRC, the gcc C compiler doesn't generate machine code itself either; it generates assembly code, which is then farmed out to a separate assembly process (using using GNU assembler aka GAS). Farming out compilation work to other processes is not new.

It seems to me that this is a really plausible trade. This approach means that they can add a just-in-time compiler "relatively" quickly, and one that should produce pretty decent code once they add some actual optimizations (because it's building on very mature C compilers). The trade-off is that this approach requires more run-time CPU and time to create each compiled component (what you term as overhead). For many systems, this is probably an appropriate trade. As I posted earlier, I'm very interested in seeing how well this works - I think it's promising.

There are presumably better ways to get assembly out of code than generating C and passing it through a compiler frontend.
That is an unproven assumption.

It's faster to hand-generate machine code straight from an interpreter than to invoke a C compiler. But that is not the only issue. As with everything else, this is a trade-off, and I'm eager to see how it works out. I can see some positive reasons to do this:

1. The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.

2. The C language, as an interface, is extremely stable, so once it works it should just keep working. Compare that to the constantly-changing interfaces of many alternatives.

3. Debugging is WAY easier. If there's a problem in generated code, it's way easier to read intermediate C code (especially after going through a pretty-printer) than many other kinds of intermediate formats, and millions of people already know it.

In short, this approach means that they can very rapidly produce a system that can run tight loops very quickly, one that resists interface instability (so the approach should keep working), and one that's easy to debug (so it should be reliable). For many applications, the fact that it takes a little more time to do the compilation may be unimportant, especially since that work is embarrassingly parallelizable.

I'm very interested in seeing how this plays out. If this works well for Ruby, I suspect some other language implementations will start considering using this approach. I'm sure it's not the best approach in all circumstances, but it might work very well for Ruby - and maybe for some other languages like it.

"If it works, it isn't stupid".

> The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.

Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".

> Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".

That depends on the machine generated code. C compilers are optimized for whatever the C compiler authors perceive as a common construct. If the generated C code uses constructs similar to what humans do, it's often quite good. If not, you can change the code that generates C, or in some cases you can convince the C compiler authors to optimize that situation as well.

Why aren't they using LLVM :-/
They answer this on the github:

> Unstable interfaces. An LLVM JIT is already used by Rubicon. A lot of efforts in preparation of code used by RTL insns (an environment)

https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#a-few...

llvm isn't always available, doesn't support as many architectures and doesn't always give the best performance.

The *nix philosophy has long been towards trying to provide choice wherever possible, so that people can use the tool that best meets their needs.

The new Ruby method JIT can use either GCC or clang as the backend. It uses C as an intermediate representation.
that's also what Theano did