Hacker News new | ask | show | jobs
by avbor 3555 days ago
I'm not familiar enough with compilers, but why would an ahead of time compiler perform worse than a just in time compiler in a static language? I think I'd understand if it was a dynamic language, because you can't know the types for sure until you start running the program, but are similar issues present for Java?
5 comments

Yes, Java code relies a lot on devirtualization to get good performance. This is because the language allows every function to be virtual by default. The flip-side is that this make coding styles that make heavy usage of interfaces (arguably good for testing, interoperability, decoupling, etc.) effectively "free" compared to ones using concrete classes.
Most of the JIT optimizations amount to "it's been called like this in the past; assume it'll always be & depotimize (at a penalty) if it isn't."

That potentially includes the fully resolved types of objects (ie devirtualization), branch prediction (stronger than the CPU can do; for instance, if a value is only used inside a branch that's never taken, don't bother mutating it), data sizes (this "array" is only ever size 2, store it in registers), dead code elimination (keeps the compiled code small), and a whole bunch more fun stuff.

> assume it'll always be & depotimize (at a penalty) if it isn't

Stuff like this makes me nervous. Performance is already a complex topic, and stuff like this makes it even more complex. Unnecessarily so. If we were talking about a very high-level programming language (say, Prolog), you could argue that the expressiveness benefits outweigh the cost of the runtime system's complexity. But Java isn't even as expressive as C++, let alone Prolog.

> fully resolved types of objects (ie devirtualization)

C++ (and similar languages: D, Rust, etc.) and MLton (a Standard ML implementation) have been using monomorphization for ages, which is a compile-time analogue of devirtualization. Moreover, monomorphization has important advantages over devirtualization:

(0) It's completely predictable. You don't need to guess when it will happen. It happens iff the concrete type (and its relevant vtables, if necessary) can be determined at compile-time: https://blog.rust-lang.org/2015/05/11/traits.html

(1) It's always a sound optimization, so it doesn't have to be undone at runtime under any circumstances.

(2) It's relatively simple to implement. In fact, a compiler front-end can completely monomorphize a program before handing it over to the back-end for target code generation.

> if a value is only used inside a branch that's never taken, don't bother mutating it)

The best way to handle unreachable branches is to avoid creating them in the first place. With proper use of algebraic data types and pattern matching, unreachable branches can be kept to a minimum, or even outright eliminated in many cases.

> data sizes (this "array" is only ever size 2, store it in registers)

C and similar languages natively handle statically sized arrays, so there's no need for runtime profiling and analysis just to determine that an array will always have size 2.

ML does something even better: you just use tuples (in this case, pairs), which reflect your intent much better than using arrays whose size has to be tested or guessed.

---

What I take away from this is that the JVM's supposedly “fancy” optimizations exist primarily to work around the Java language's lack of amenability to static analysis.

Hence why we have projects Valhala and Panama.

However bringing those features to Java and the JVM specification, while keeping existing code running, is a big engineering effort, which is why they are targeting Java 10+.

I would have like that back in those days, Gosling and his team would have taken the inspiration from Modula-3, Oberon, Component Pascal and many others.

But better later than never, and regarding Java 10 time frame, I am happy they aren't doing a Python break the world thing.

I think you're ignoring the fact that there is lots of extra information available at runtime that isn't available from static analysis of languages even if they're very amenable to that, and that static analysis can actually give you worse information.

For example a call site could be statically analysed to be bimoprhic, but then sometimes when you run it the second type is never actually used and the call site can be made monomorphic.

The same thing applies to branches - they're both possible to use, but often when you run it with real data you only actually use one of them.

So I don't think these optimisations are primarily to work around amenability to static analysis - they achieve somethign different and actually more powerful.

I can even give you a real-world example of where static analysis is actually what causes unpredictable performance. There is an implementation of the Ruby language called Rubinius that statically looks at the instance variables in a class that are visible in the source code, and optimises the objects for that many instance variables. If you start to set extra variables dynamically, and so upset this static analysis, performance drops by a half. In the implementation of Ruby that I work on we can see the static references to instance variables in the source code but don't try to do anything based on this - we purely use a hidden class system and let the true number of instance variables emerge dynamically at runtime, and we don't have the same performance drop when you start to set extra variables dynamically (http://chrisseaton.com/rubytruffle/pppj14-om/pppj14-om.pdf)

I think it's quite a neat philosophy to apply - don't assume anything statically and let the real characteristics of the program's data and control flow emerge at runtime.

> There is an implementation of the Ruby language called Rubinius that statically looks at the instance variables in a class that are visible in the source code, and optimises the objects for that many instance variables. If you start to set extra variables dynamically, and so upset this static analysis, performance drops by a half.

You can implement an unsound static analysis for any language, and this in fact what Rubinius is doing: it's making potentially wrong conclusions about what instance variables will exist in objects. However, unsound analyses are outright harmful:

(0) Undoing unsound “optimizations” costs even more performance than was supposed to be gained by optimizing your program. (As you found out the hard way yourself.)

(1) They lie to you about what your code means! If this isn't bad enough, I don't know what else could be.

Unfortunately, a sound static analysis of Ruby code wouldn't be able to tell you much, precisely because Ruby allows you to subvert everything at runtime.

> As you found out the hard way yourself.

That makes it sound like I implemented it - I didn't - I implemented the alternative mechanism which doesn't have the same problem.

> With proper use of algebraic data types and pattern matching, unreachable branches can be kept to a minimum, or even outright eliminated in many cases.

Those would still result in branches. speculative optimizations can turn the common case into branch-free code that uses traps in the uncommon ones to bail out. But since those things are often data-dependent you can't just bake them in at compile-time.

One thing JIT compilers are good at is specializing methods for common (runtime) types; e.g. you have a method operating on Iterable but it turns out most of the time you get a List as input; you can generate code that bypasses the method lookups for .size etc. The question of whether this pays off is usually only answerable at runtime.
Actually that's not a thing most JIT compilers are good at, as such runtimes are hard to build. For the mainstream ones, you only have Java and the major Javascript runtimes.
A JIT can use information from runtime for optimizations, you pay a cost in startup time. Being worse or better depends on the workloads and implementations.
AOT can as well. PGO.
Yes, but that locks you in to optimising for whatever you covered with your profiling. If the character of your data changes, it'll take a recompile to change how the application performs, while a JIT can potentially choose to deoptimise/reoptimise.

I'm all for "as static as possible" toolchains, but there are optimization opportunities you simply won't have with AOT, PGO or not. E.g. consider something trivial: A program doing certain image operations that depends on dimensions passed in on the command line. A JIT could optimise the inner loops for the actual operations. To get the same with AOT even with PGO would be totally unable to deal with it without causing a massive explosion in code size.

Hence why .NET also has MPGO, Managed Profile Guided Optimizations for NGEN.

https://msdn.microsoft.com/en-us/library/hh873180(v=vs.110)....

In theory yes, but sadly i have never seen that promise realized in any consistent fashion. Same goes for Java's escape analysis. Although the principle is sound i think the engineering required is horrendously difficult to make it robust. In a very narrow window of variation it works, but should you step out of that zone it fails pretty badly. I think it will take many more years, till then pgo and metaprogramming it is.
Regarding escape analysis, beware that the OpenJDK is the worst of them all.

IBM J9, Graal are much better at it, and I bet Aonix, Azul and other JDKs targeting high performance deployment scenarios are even better.

Speculative optimisations yield about a 20% improvement in Java and more for higher level languages like Scala (and for Ruby it's off the charts). HotSpot isn't perfect by any means and C2's EA is not strong enough, but it's on track to be replaced with Graal (which is a part of what AOT is all about - you don't want your VM compiling itself at the same time as compiling your app).
PGO in C++ can yield quite significant speedups, however the difficulty of integrating it into a development workflow means that in practice it's hardly ever used. One of Java's accomplishments is that it brought PGO to the masses by making it entirely built in and automatic.
There are a set of optimisations that you can only do at runtime that you can't do with AOT. Anything that depends on data is something that you can't reliably do, such as eliding null checks if you can prove this cannot happen based on the data passed in. There are also cases where you can have multiple subclasses of a type such as a normal and a debug subclass, or multiple drivers for different back ends such as MongoDB or Cassandra, only one of which is used at runtime but you cannot know ahead of time which is selected (for example, it's based on an environment variable or system property).

The point is that while AOT can do a set of optimisations, including whole module analysis, there are a set which are only available at runtime.

Almost all languages can benefit from profile guided optimisations, including very static languages like C++