|
|
|
|
|
by teddyknox
3386 days ago
|
|
What differentiates LLVM IR from, say, JVM bytecode? I'm curious because there's a stalled out GNU project under GCC called GCJ that would compile JVM bytecode to native. I wonder if the issue became that statically linking in the JVM in the binary resulted in a lot of bloat, or something more intrinsic to the suitability of JVM bytecode as a platform-independent IR... |
|
* JVM bytecode is stack-based, whereas LLVM uses "infinite registers" in SSA form
* being in SSA form makes it convenient to consume in compiler passes, but comes with quirks that mean you don't really want to write in that style manually: the mind-bending phi instructions, definitions must dominate uses, simple ops like incrementing a variable really means creating a new variable, etc.
* JVM bytecode carries a lot of Java-level information, for instance if you have N classes with M methods each in source, you will typically find N classes with M methods in bytecode too. A lot of keywords in Java have an equivalent in bytecode (e.g. private, protected, public, switch, new...)
* in contrast, LLVM IR feels closer to C (it only knows about globals, arrays, structs and functions). It exposes lower level constructs like vector instructions, intrinsics like memcpy
* JVM bytecode is well specified: anyone armed with the pdf [1] can implement a full JVM. LLVM IR is somewhat loosely defined and evolves based on the needs of the various targets
* JVM bytecode is truly portable, whereas target ABI details leak into LLVM IR. A biggie is 32 bit vs 64 bit LLVM IR.
[1] https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf