Hacker News new | ask | show | jobs
by metaobject 2940 days ago
Language implementation-wise, can anyone explain why/how Julia is able to get close to C-level performance? Is it doing some extra steps under the hood (JIT compilation?) that Python and R aren't doing?
5 comments

Julia's JIT compilation is rather different than what is referred to as JIT compilation in other languages, such as Java or JavaScript, where the language is interpreted (which may be interpreting instructions from a virtual machine such as the JVM), and the run-time decides if some code is being hit frequently enough to warrant compilation to native code. Julia first compiles to an AST representation (also expanding macros, etc), performs type inference, etc. When a method is called with types that haven't been used before to call that method, that's when Julia does it's magic and compiles a version of that method specialized for those types, using LLVM to generate the final machine code (just like most C and C++ implementations these days, as well as Rust and others). That also means that it's rare for Julia to have to dynamically dispatch methods based on the type of the arguments, which is one of the things that can really slow down other languages with dynamic types.
The easiest way to think about Julia's performance is closely related to the observations that inspire tracing JIT's for many languages -- most code in dynamic languages doesn't make use of the features that make efficient compilation impossible. Julia's response to that observation was to build a dynamic language that lacked some of the most extreme features in Python or R that act as barriers to efficient compilation.
It's also worth noting that Julia's JIT isn't tracing: it does all its compilation before the code is run (unless it hits a path which hasn't been run before, or wasn't inlined, in which case it runs the compiler again). I've heard it described as "really an ahead-of-time compiler that just runs really, really late".
BTW, is there a write-up of what those blocking features are? I don't recall ever seeing a blog about that. Could be an interesting e"if you want to make a JIT-friendly language, don't do this, do this instead" type of article.
I agree that would be great. The crude answer is: make it easier for a computer to figure out what will happen when you run the code.

The example I usually use is allowing integers to overflow, instead of automatically promoting to arbitrary precision (Python), or converting to a sentinel value (R). Integers are used in a _lot_ of places, so inserting these checks (or worse, access to heap-allocated memory) makes it difficult to optimise. (throwing an error might be a reasonable alternative in some cases).

Another is that you make it easier for the compiler to figure out things about an object, such as its size (e.g. you can declare the types of the fields of a Julia struct) and whether or not it can be mutated (immutable objects are easier to optimise).

I wouldn't use that as a primary example (allowing integers to overflow), because one of the great things about Julia is that it is incredibly easy to define your own types that will simply work, that for example, do checked arithmetic on integers (SaferIntegers.jl, I think is one, or don't want a limit (BigInt, which is included in Julia). Julia gives the programmer the choice, and not only that, allows the programmer to create their own choices.
> The example I usually use is allowing integers to overflow, instead of automatically promoting to arbitrary precision (Python), or converting to a sentinel value (R).

IIRC Julia used to automatically promote integers, is this the main reason why this was dropped?

No, I don't believe integers ever promoted on overflow (or at least not since 2012).

If an operation involves two different integer types, they do promote to the larger one (i.e. an Int64 + a BigInt will give a BigInt).

Oh no, I'm very certain of this: I distinctly recall a github issue where people complained that addding two 32bit integers resulted in a 64bit integer, which was justified as giving more correct answers due to potential integer overflow.
One of them is being able to override `setattr` and `getattr` at runtime in Python. It can be pretty tricky to prove it can't happen, so (unless you have optimistic and pessimistic codepaths) you get into a situation where every attribute lookup makes indirect function calls and hash-table lookups.
Yes, it's JIT compiled.

And my (very crude) understanding is that the stronger type system makes this much easier than in Python. The compiled version of any function is specific to the types of its inputs, and thus need not contain any further checks: simple functions often end up with literally the same assembly as C would produce.

It's "extremely lazy ahead of time compiled", is one way I've described the compilation model, since you're basically never executing code in an interpreted fashion (usually jits let you do either). Also, typically jit's choice of when to but may be non-deterministic, or deterministic but difficult to understand. When Julia chooses to compile is pretty easy to understand
I believe though that there is some work being done on actually directly interpreting the AST, in cases where going through all the work of generating LLVM IR and compiling that to native code is unnecessary, particularly when it is code that is only run once when a package is compiled the first time.
If anyone wants to do more research on this, the keyword is "monomorphization".
Perhaps the best way to understand what makes Julia fast is to watch these two videos about Python and R and what makes them so hard to optimize:

https://www.youtube.com/watch?v=qCGofLIzX6g

https://www.youtube.com/watch?v=HStF1RJOyxI

Take everything mentioned in these videos that make Python and R really hard to optimize and don't do those things :D

Yes, it's using JIT compilation (last I checked, they are using LLVM as the backend). Combined with a language design that takes JIT compilation into account from the get-go, making the problem much easier than trying to use a JIT later on (see e.g. PyPy).