Hacker News new | ask | show | jobs
by mfranc42 1119 days ago
He probably meant that a compiler is a pure function for a given source file and a set of flags. Of course, if you change a compiler or flags, you can get something radically different, but you shouldn't get something radically different because it's Tuesday, midnight, your username starts with n, you have more than 3 hard drives, or because it reads some bytes from /dev/urandom.

It's bit of a stretch, sure, but I don't expect my performance problems to go away by recompiling stuff over and over again without changing either the code or flags and expecting the optimizer to make a better decision next time. It doesn't work that way.

3 comments

> a compiler is a pure function for a given source file and a set of flags.

This is true, but not what they actually said.

> you shouldn't get something radically different because it's Tuesday, midnight [...] I don't expect my performance problems to go away by recompiling stuff over and over again without changing either the code or flags and expecting the optimizer to make a better decision next time

Maybe we should, though. I shouldn't have to be forever locked into the performance profile that matches the -O level that my distribution's package maintainers used at compile time.

I'd love to see a shift in programming systems towards a place where the language is designed with particular attention to how fast it can be processed by the toolchain to get _something_ on disk as quickly as possible and further optimization is deferred to a later stage to be fulfilled by a separate, asynchronous process. Imagine if there were no tradeoff between time spent waiting on the compiler to finish vs runtime performance, because your program no matter how large would never take more than 20 seconds* to compile. For more modestly sized programs, the effect would be the ability to test it almost immediately, but the compiler continues optimizing away all the while—to the point that you could even go to sleep on Wednesday and wake up on Thursday morning with a program that's even snappier. The expected outcome should be faster compile times and faster binaries.

* or choose your own adventure

Optimizing compilers often have budgets so you can get different outcomes if performance jitter happens to mean that it bails from some optimization stratagies.
The budgets I've seen have less been "if it's taken N seconds, bail" and more "if we've processed X million instructions, bail." At least in the compilers I work on, nondeterminism is considered a bug, although it can be harder to avoid nondeterminism than you'd think.
>and expecting the optimizer to make a better decision next time. It doesn't work that way

But in practice it does work exactly that way because of PGO and if we're precluding PGO then the observation is as banal as "programs that have no side-effects are pure". Like I get that the post is trying to paint some beautiful picture about how compiler passes are abstract beautiful transformations of representations of programs but it's not a useful picture at all.

I don't remember using PGO like ever. And even if you do use it, isn't it like breaking compilation into two phases that are themselves deterministic?

It is pretty useful picture in my eyes, because that's what I've been observing since like forever. Non-deterministic compiler would be pretty hard to test or reason about.

It might be a banal observation, but an important one if you want to contrast a compiler with something like an OS kernel or most modern programming projects that interact heavily with outside world. When you throw hardware, network traffic, or users into the mix, it gets crazy.

> Like I get that the post is trying to paint some beautiful picture about how compiler passes are abstract beautiful transformations of representations of programs but it's not a useful picture at all.

It is actually a very useful picture especially to a beginner.

PGO is an edge case and not used a whole lot in practice. Many compilers do not support it at all, even production compilers. Someone learning to write a compiler does not need to think about PGO. And besides, a compiler with PGO is still a pure function of source + flags + profile.

>PGO is an edge case and not used a whole lot in practice. Many compilers do not support it at all, even production compilers.

Both clang and gcc support `-fprofile-generate`. Beyond that generic infrastructure, in my area (DL compilers) if you're not doing PGO/autotuning you're not serious.

>Someone learning to write a compiler does not need to think about PGO.

That's like saying someone learning to write a compiler doesn't need to think about optimization at all - sure maybe the first time. But then every time after that it should be at the forefront of your mind.

> Both clang and gcc support `-fprofile-generate`. Beyond that generic infrastructure, in my area (DL compilers) if you're not doing PGO/autotuning you're not serious.

How many binaries in a typical Linux distro are built with PGO for example? The answer is approximately zero.

Your point? Because your first goalpost was "most compilers don't even support PGO". And now the goalpost is Linux distros? How about this one - how many binaries at FB/G are built with PGO? Answer: a great deal (llvm/bolt is a FB project and MLGO is a G project).
My point is that the view of a compiler as a series of transformation passes from a source program to machine code, implemented as pure functions, is an entirely appropriate way to describe how they work in a random introductory blog post?

Congratulations on your recent PhD in quantum machine learning or whatever -- I hope you find some way to speed up FB/G's ad serving algorithms by 1% and land that promotion. I'm sure you're very smart.