Hacker News new | ask | show | jobs
by cfv 2246 days ago
I still remember when Clang bringing LLVM along was seen as SO OUT THERE and I'm just mentioning it because I find it weird to be old enough to see fads in system languages come and start to go.

Just curious, do you have any examples of this "limitations" you speak of? Sounds like a very interesting read.

4 comments

As an example, WebKit had an LLVM-based JavaScript optimizer in 2014 (https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/), but dropped it for another one in 2016 (https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...)

In broad strokes, LLVM chooses to optimize for generating good code for statically compiled code more than for, for example, memory usage, compilation speed, or ability to dynamically change compiled code. That doesn’t make it optimal for JavaScript, a language that’s highly dynamic and often is used in cases where compilation time can easily dwarf execution time.

Worth noting that B3’s biggest win was higher peak throughput. It generated better code than llvm. It achieved that by having an IR that lets us be a lot more precise about things that were important to our front end compiler.

It’s not even about what language you’re compiling. It’s about the IR that goes into llvm or whatever you would use instead of llvm. If that IR generally does C-like things and can only describe types and aliasing to the level of fidelity that C can (I.e. structured assembly with crude hacks that let you sometimes pretend that you have a super janky abstract machine), then llvm is great. Otherwise it’s a missed opportunity.

> It achieved that by having an IR that lets us be a lot more precise about things that were important to our front end compiler.

Do you have any examples off-hand? I presume caring about patchpoints and OSR is as fair gain to start with?

And aliasing. The aliasing story in B3 is so wonderful. That was one of the biggest wins - being able to say for example that something can side-exit (and can do weird shit after exit) but doesn't write any state if it falls through.
LLVM's MCJIT library is 17MB. If you have a language that you want to JIT and you thought you could embed your language like lua (<100k), Python (used to be ~250k but now <3M), you're looking at almost 20MB out of the gates. Not ideal!

Also if you want to use llvm as a backend for your project and expect to build llvm as part of a vendored package, the llvm libraries with debug symbols on my machine was about 3GB. Also not ideal.

Llvm makes some questionable choices about how to do SSA, alias analysis, register allocation, and instruction selection. Also it goes all in on UB optimizations even when experience from other compilers shows that it’s not really needed. Maybe those choices are really fundamental and there is no escaping them to get peak perf - but you’re not going to know for sure until folks try alternatives. Those alternatives likely require building something totally new from scratch because we are talking about things that are fundamental to llvm even if they aren’t fundamental to compilers in general.
I dislike UB, but I do at language level. When LLVM is reached, UB can only have and only be continued to be removed, never added (from a global point of view, applying general as-if rules a compiler can always generate its own boilerplate in which it knows something can not happen, then maybe latter leverage "UB" to e.g. trim impossible paths, that are really impossible in this case -- at least barring other language level "real" UB). So are there really any drawback to internal exploitation of "UB" (maybe we should call it otherwise then) if for example the source language had none?
It is true that compilers sometimes have to have operations that have a semantics that are defined only if some conditions hold. But LLVM's and C's interpretation of what happens when the conditions don't hold is extraordinarily liberal and I'm not sure that is either beneficial or sane.

Like, LLVM tries not to add UB, but design choices it made to support optimization with UB do sometimes result in new UB being introduced, like the horror show that happens with `undef` and code versioning.

So, I think that optimizing with UB internally is fine but only if it's some kind of bounded UB where you promise something stronger than nasal demons.

> Llvm makes some questionable choices about how to do SSA, alias analysis, register allocation, and instruction selection.

Do you mind expanding more on these points or directing me to some places where I can learn more about them? Compilers are a fairly new field for me, so anything I can learn about their design decisions and tradeoffs are worth their weight in gold.

HN isn't the place to go for conservative opinions on compilers :)