Hacker News new | ask | show | jobs
by wmobit 5503 days ago
Google has said before they didn't use Clang and LLVM because of performance issues. GCC is far from dead, and probably never will. Clang generated code is still typically 10-20% slower than GCC. Lots of companies work on GCC, including Google, Intel, AMD, IBM, Red Hat and others.
2 comments

I've seen this "10-20% slower" meme several times, but I've also read several accounts of 10-20% faster runtime performance.

All I know for sure is I compiled my codebase with Clang for the first time yesterday and the compilation time was absurdly short. I thought the compiler was broken. And it enables excellent tools like clang_complete for Vim code completion.

It depends on the code - while GCC, being more mature, is typically better at most optimisations, there are a few cases where Clang produces code that is a fair bit faster.

Over time, as Clang gets more mature, it will become more and more on par (or better) than GCC.

Woah, let's not conflate compilation-speed with runtime-speed...
Read carefully and you'll see I was not.
May be but your post was ambiguous wrt its implications. Defensive writing would have had you writing it more carefully about what you meant.
The parent was not intimating what you inferred.
Apple's been pushing developers toward LLVM for several years, and is bound to make it an app store requirement at some point.

I've observed no such 10-20% slowdown on iOS, which is more CPU-constrained than most realms. Typically, LLVM-generated code is equivalent to or faster than gcc. Sometimes it's much faster.

Just a guess: couldn't be an x86(-64) vs ARM thing? Like GCC being better on x86, and LLVM on par with it on ARM? Or maybe they tuned the compiler for the iDevices?
People forget that AOT can only get you so far. JIT not only has all the info the AOT mechanism has, but also has real runtime fact based data, and can do wonders with it.
For the sake of argument:

What is an example of an optimization that a JIT compiler can make that a AOT compiler cannot?

If the developer is able to profile the application on typical end-user workloads, don't profile-guided optimizations provide the same benefit as JIT runtime profiling?

Why can't an AOT compiler just consider every path a "hot" path?

Last but not least: Got any benchmarks?

For one: JIT can do polymorphic inline caching (you can read more about from Google's senior vice president of operations Urs Hölzle[1]), while AOT can't.

Wikipedia gives a few more[2]: runtime profile-guided optimizations and pseudo-constant propagation

[1] http://research.google.com/pubs/author79.html

[2] http://en.wikipedia.org/wiki/AOT_compiler

The Polymorphic Inline Caching paper refers to AOT compiling with runtime hints.

In the case of non-dynamic languages like C and C++ that clang generally targets, are there other examples of where JIT would make things possible that are not possible in AOT?

Profile guided optimizations that are relevant for the specific invocation of the program. Loop optimization based on invocation parameters for that specific run of the program. Hard-coding in the jump target address for calling functions from dynamically loaded libraries (can't do that AOT, because if the library is replaced, the symbol offsets change).

Optimizing for the specific processor you're running on, as opposed to being forced to compile for a lowest common denominator.

A whole bunch of other small things like that.

One nice thing JITs can do that AOT compilers can't is on-stack replacement. That's where you recompile a particular function at run-time based on new information. This allows you do speculative optimizations.

For example, you might see that branch X is always taken. So you assume that X will always be true, and add a guard just in case which triggers a recompilation. You reoptimized the function on the basis of your new (speculative) information about X. This could improve register allocation, allow you remove lots of code (other branches maybe), inline functions, etc.

Java JITs have been known to inline hundreds of functions deep with this.

A simple example: let program P do a zillion <something> * <command line argument> multiplications, and call the program every hour with argument value zero or one, depending on a coin flip. An AOT compiler would not even know that the program will never be called with other arguments. A JIT compiler could remove all multiplications.

Profile-guided optimizations only work on the next run, and, when used by the developer, do not work for cases where there are widely different usage profiles for a single program. For example, most users would have data sets that fit in memory, but others will have ones that do not.

Wouldn't you get a code explosion and difficulties dealing with cache coherency if every path was a hot path (serious question, I don't know much about this stuff)?
LLVM isn't really a JIT, and I've never heard of people using it as one for C++. (Also, those that tried to use it as a JIT had lots of problems, like Unladen Swallow).
Rubinius is using it successfully though (I think anyways, does the Ruby community have benchmarking infrastructure?)
Yes. My impression is that it's the same kinda-sorta successfully as Unladen Swallow - pretty good, but nothing like LuaJIT2, V8 or SpiderMonkey.
Apparently PyPy still has work to do, we must insert ourselves into this list!