Hacker News new | ask | show | jobs
by rswier 3834 days ago
IMHO aggressive optimization at compile time is an example of premature optimization. Let the hardware have access to a straightforward representation. Once the run-time hot-spots are identified, the hardware (firmware, VM, whatever) can rewrite the binary code to execute faster. Excessive compiler optimization makes this difficult or impossible (too much information thrown away.) Compilers should be designed for fast compilation speed.
2 comments

That would work if most applications spent all their time in a few hot spots. But, contrary to popular wisdom, that's usually not the case. Most applications have flat profiles (to steal a quote from DannyBee—but it matches my experience as well). They have flat profiles because people have spent a lot of time optimizing them. In this context—which is the norm—eliminating optimizations to save compile time and deferring optimization to a few "hot spots" has the effect of turning off optimization for the whole program.

It's common to write off compiler optimizations as unimportant, because they're invisible and people don't see them. They're also complex, which makes people predisposed to get rid of them in the name of "simplicity". But, for better or for worse, optimizing compilers are necessary complexity.

Optimizing compilers are not ubiquitous because compiler engineers just like to play with technology. They're ubiquitous because you need them.

I agree optimization is important. So important that it should be pushed down into the hardware. Binaries should look almost like source code. But that's just my vision for what it's worth.
Hardware already does insane amounts of optimization. The modern superscalar out of order processor basically does it's own JIT from X86 into their own internal micro-ops. Reordering instructions on the go etc. That's another 2-10x speed difference on modern computers.
Completely agree :) But even better to push C code straight down to the hardware and let it crunch on that! Let it allocate a few thousand registers, or spawn off an FPGA compiler to create a few new instructions. Crazy?
Hardware doesn't work like this. You might want to read Hennessy and Patterson, and the original RISC I paper.

http://www.amazon.com/Computer-Architecture-Fifth-Edition-Qu...

http://www.cecs.pdx.edu/~alaa/ece587/papers/patterson_isca_1...

RISC created a huge local minimum by speeding up C code to the exclusion of other languages. I predict that eventually future processors will hide more features from the higher software levels (such as number of registers, instruction types and formats) in order to improve efficiency at the machine level. I think we are seeing this trend with GPUs already. Current CPUs don't do this because they have to maintain binary compatibility with a huge installed base. We can compare notes in a decade or so :-)
It doesn't really seem like this solves the problem of optimizers introducing bugs and vulnerabilities.

Take the canonical optimizer-created security hole: the hardware optimizer replaces a constant-time compare (which doesn't leak timing information) with a variable-time compare (which does).

I don't think this solves the problem we're setting out to solve, ie. the optimizer introducing bugs.

That good point, replacing a constant time compare with a non constant time compare is a very terrible bug to introduce.

A more amusing issue I saw was an optimization that looked for places where it could replace manual memory copying with ca call to memcpy. Something that drove the guys writing libc nutzoid. Because it was replacing the code in memcpy with a call to memcpy. (On some platforms you can implement memcpy with special assembly language calls. On some you can't)

Personally I care little about speed, since if I need more speed I can get that. And frankly if you tell me the resulting binary is 20% faster for some things, I just do not care. But I worry a lot about losing the ability to reason about side effects.

Maybe we are all better served by faster compilers that create straightforward binaries (and less bugs overall both in the compiler and application code.) Optimization researchers could focus on source-to-source transformation tools with intelligent human-in-the-loop guidance. Or else they can work at the hardware/JIT level if they prefer.

Right now compiler writers are playing in a kind of local minimum (premature optimization as I said.) This may produce 3x-5x faster binary code today but also forces CPU manufacturers to retain backward compatibility causing them to also stay stuck in this local well. Eventually a new architecture is created (with 10x the registers etc.) and the cycle continues.