Hacker News new | ask | show | jobs
by catnaroek 3553 days ago
> assume it'll always be & depotimize (at a penalty) if it isn't

Stuff like this makes me nervous. Performance is already a complex topic, and stuff like this makes it even more complex. Unnecessarily so. If we were talking about a very high-level programming language (say, Prolog), you could argue that the expressiveness benefits outweigh the cost of the runtime system's complexity. But Java isn't even as expressive as C++, let alone Prolog.

> fully resolved types of objects (ie devirtualization)

C++ (and similar languages: D, Rust, etc.) and MLton (a Standard ML implementation) have been using monomorphization for ages, which is a compile-time analogue of devirtualization. Moreover, monomorphization has important advantages over devirtualization:

(0) It's completely predictable. You don't need to guess when it will happen. It happens iff the concrete type (and its relevant vtables, if necessary) can be determined at compile-time: https://blog.rust-lang.org/2015/05/11/traits.html

(1) It's always a sound optimization, so it doesn't have to be undone at runtime under any circumstances.

(2) It's relatively simple to implement. In fact, a compiler front-end can completely monomorphize a program before handing it over to the back-end for target code generation.

> if a value is only used inside a branch that's never taken, don't bother mutating it)

The best way to handle unreachable branches is to avoid creating them in the first place. With proper use of algebraic data types and pattern matching, unreachable branches can be kept to a minimum, or even outright eliminated in many cases.

> data sizes (this "array" is only ever size 2, store it in registers)

C and similar languages natively handle statically sized arrays, so there's no need for runtime profiling and analysis just to determine that an array will always have size 2.

ML does something even better: you just use tuples (in this case, pairs), which reflect your intent much better than using arrays whose size has to be tested or guessed.

---

What I take away from this is that the JVM's supposedly “fancy” optimizations exist primarily to work around the Java language's lack of amenability to static analysis.

3 comments

Hence why we have projects Valhala and Panama.

However bringing those features to Java and the JVM specification, while keeping existing code running, is a big engineering effort, which is why they are targeting Java 10+.

I would have like that back in those days, Gosling and his team would have taken the inspiration from Modula-3, Oberon, Component Pascal and many others.

But better later than never, and regarding Java 10 time frame, I am happy they aren't doing a Python break the world thing.

I think you're ignoring the fact that there is lots of extra information available at runtime that isn't available from static analysis of languages even if they're very amenable to that, and that static analysis can actually give you worse information.

For example a call site could be statically analysed to be bimoprhic, but then sometimes when you run it the second type is never actually used and the call site can be made monomorphic.

The same thing applies to branches - they're both possible to use, but often when you run it with real data you only actually use one of them.

So I don't think these optimisations are primarily to work around amenability to static analysis - they achieve somethign different and actually more powerful.

I can even give you a real-world example of where static analysis is actually what causes unpredictable performance. There is an implementation of the Ruby language called Rubinius that statically looks at the instance variables in a class that are visible in the source code, and optimises the objects for that many instance variables. If you start to set extra variables dynamically, and so upset this static analysis, performance drops by a half. In the implementation of Ruby that I work on we can see the static references to instance variables in the source code but don't try to do anything based on this - we purely use a hidden class system and let the true number of instance variables emerge dynamically at runtime, and we don't have the same performance drop when you start to set extra variables dynamically (http://chrisseaton.com/rubytruffle/pppj14-om/pppj14-om.pdf)

I think it's quite a neat philosophy to apply - don't assume anything statically and let the real characteristics of the program's data and control flow emerge at runtime.

> There is an implementation of the Ruby language called Rubinius that statically looks at the instance variables in a class that are visible in the source code, and optimises the objects for that many instance variables. If you start to set extra variables dynamically, and so upset this static analysis, performance drops by a half.

You can implement an unsound static analysis for any language, and this in fact what Rubinius is doing: it's making potentially wrong conclusions about what instance variables will exist in objects. However, unsound analyses are outright harmful:

(0) Undoing unsound “optimizations” costs even more performance than was supposed to be gained by optimizing your program. (As you found out the hard way yourself.)

(1) They lie to you about what your code means! If this isn't bad enough, I don't know what else could be.

Unfortunately, a sound static analysis of Ruby code wouldn't be able to tell you much, precisely because Ruby allows you to subvert everything at runtime.

> As you found out the hard way yourself.

That makes it sound like I implemented it - I didn't - I implemented the alternative mechanism which doesn't have the same problem.

> With proper use of algebraic data types and pattern matching, unreachable branches can be kept to a minimum, or even outright eliminated in many cases.

Those would still result in branches. speculative optimizations can turn the common case into branch-free code that uses traps in the uncommon ones to bail out. But since those things are often data-dependent you can't just bake them in at compile-time.