Hacker News new | ask | show | jobs
by oconnor0 3309 days ago
> Additionally, the compilers can and do replace algorithms if users let them. Most users don't want it, even if it makes code faster.

What compilers do this? With what algorithms?

3 comments

If you look at modern JIT compilers such as Truffle-Ruby, then yes it will change algorithms. e.g. changing sort algorithms depending on data size. if the data is small enough it will do a bubblesort on registers, when its big it will do TimSort on heap data etc...[1]

[1]: http://chrisseaton.com/rubytruffle/small-data-structures/

This example is about the compiler changing between (existing) internal versions of sorting algorithms for language data structures.

Not about changing what algorithms the user coded.

Exactly, that's two completely different things. It's akin to a compiler using the famous "switch statement becomes if-else chain if less than 7 items, otherwise lookup table".

The parent posts are referring to macro-level optimisations, where large changes to how a system works are implemented, in order to get massive gains in performance. Picking out the essence of the larger implementation would be a difficult task for a compiler, but more importantly it wouldn't have enough information about the context of the application to make optimisation decisions. Things like: do we need to read from that file each time we need this piece information, or is it a once-off that can be cached? Unless this contextual information is expressed somehow (and it basically never is), only the programmer will be able to make the change.

    def clamp(num, min, max)
      [min, num, max].sort[1]
    end
being turned into

    def  clamp(num, min, max)
       if (num < min)
       {
         if (num < max) 
             return num; 
         else 
             return max;
       }

       if (min < max)
         return min;
      else return max;
   end
Seems to me to be algorithm rewriting. As is superword optimization and partial evaluation. All things that Truffle-Ruby + Graal do today.
Sure, and at this level, I would also argue that compilers are as good as humans at optimisation (likely better than almost all, actually). These arguments were constantly going on in the 80s and 90s, where (assembly-enthusiast) people vehemently argued that machines (C compilers) will never optimise things as well as humans. While the argument still exists in some form today, it's certainly died down a lot.

That said, in my own limited experience, everyday optimisation problems tend to exist at a much higher level, on the high-level approach or "what are we doing" level. Perhaps these are "obvious" to some, or below the level of discussion here. But essentially, a compiler can optimise away endlessly at a piece of code, but will never beat code that shouldn't exist in the first place. My comment above was that a compiler has insufficient information to make decisions about what's truly wasteful or useless.

As an example, no compiler today will come up with a Courgette update [1] by itself. And the day it does, I think we can pack up our bags and go home (hopefully in a nice, comfy retirement kind of way).

[1] https://www.chromium.org/developers/design-documents/softwar...

> These arguments were constantly going on in the 80s and 90s, where (assembly-enthusiast) people vehemently argued that machines (C compilers) will never optimise things as well as humans.

I still argue this way. But these kinds of optimizations are tedious and time-intensive (thus costly). Also dependent on the architecture, i.e. when some extensions (say new SIMD instructions) are added to the instruction set, the compiler sometimes is able to use them automatically (though typically in a very sub-optimal way), while hand-optimized assembly code has to be rewritten (costly). Also if you port your program to a new CPU architecture (say Intel -> ARM or ARM-A32 -> ARM-A64) hardly anything can be reused. Finally tight hand-optimized code tends to be much harder to add features to than more high-level code (say, C code).

So I believe these vehemently arguing people are right (and have always been). But this does not contradict the fact that in many cases a very suboptimal code generated by the (say C) compiler is often fast enough and C is much more economical to use.

I agree with this, reading it I feel once again the limitations of written communication forms.

However, the top level discussion is about is it worth using an optimizing compiler. Or in my example is it worth using TruffleRuby instead of MRI. I would argue yes because that optimizing compiler gets you 10x more perf per cpu.

I mean that's basically constant folding on steroids, in this case taking advantage of the fact it knows the array has three elements, and that we only care about entry 1. Which is impressive, really impressive, but it is fundamentally limited. This approach won't ever be able to take your bubble sort implementation and turn it into a quick sort or merge sort. In general I don't think this sort of optimization will lead to asymptotic improvements in runtime, excluding certain edge cases.

Which isn't to say these optimizations aren't important and can lead to very real improvements in speed, I just wouldn't consider it algorithm rewriting in the strictest sense.

The only way to automatically replace nontrivial algorithms safely to provide a formal proof of each one then do a library match.
https://books.google.com/books?id=u38h_fV3UqgC&pg=PA173&lpg=...

See also the citations to it, http://dl.acm.org/citation.cfm?id=340757

See also https://www.researchgate.net/publication/221200232_Algorithm...

http://ieeexplore.ieee.org/abstract/document/5447280/

http://perso.ens-lyon.fr/christophe.alias/pub/alias_thesis.p...

https://www.ideals.illinois.edu/handle/2142/45394

http://gac.udc.es/~juan/papers/toplas.pdf

and ...

As the last paper says: "e. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90’s."

If you want a simpler example, all of your compilers are usually replacing your divisions by a different algorithm :) It replaces your sum reductions, recurrences, etc.

Seriously, these are really really old compiler problems. Compilers do the parts people care about, and will often replace the algorithms that are usually "slow", like math, or have well defined semantics that won't cause api breakage for you. Compilers replace memcpy, printf, etc, all the time.

If you want a compiler that replaces your much higher level algorithms, as the papers show, they have existed over the years. When the research papers are scanned copies, it's old tech :P

But the truth is that, for example, a lot of common languagers (C++, etc) are not good languages for that. It's not about compiler technology. I can teach a compiler to recognize your sorts, for example, pretty easily. But proving i can legally change it and not destroy the programming semantics is often literally not possible. You'd have to tell me "it's okay". It's only stuff that is well specified enough (like standard library functions) that i could do.

Think about trying to swap STL data structures for some code. We can recognize when it is possible. We can even determine when it is profitable, and do it. Usually, the problem is it destroys the API you've built to do so (IE you pass it around as an argument), etc. and in the end, it's usually not worth it for those languages to do it in the compiler. It's infinitely more effective to do something like http://lists.llvm.org/pipermail/llvm-dev/2016-April/098355.h... and tell people "you should switch this datastructure". At least, for these languages.

But hey, if you want the compiler to duplicate and inline all the functions and swap out the algorithms, it could!

It's just that, for example, C and C++ are not languages that lends itself to compilers doing it, it's one that lends itself to telling humans to do it (or at least, asking them if they want it done and then doing it)

This is obviously all easier in functional languages, and they actually have a fairly rich history of doing algorithm replacement :)

> replacing your divisions by a different algorithm

No they're not. I don't actually specify the algorithm to be used for division, I just write "/" (usually), and so the compiler (or runtime or whoever) is certainly free to implement that "high level" specification (I want these two quantities divided) in any way it sees fit.

I'm really unsure why you want to be this pedantic, but okay. You are making a lot of assumptions here. My point is that bunch will idiom recognize division algorithms and change them. So yes, yes they do!
> pedantic

I am being precise where you are incredibly sloppy.

> You are making a lot of assumptions here.

Really? Which assumptions am I making? Try to be precise.

Considering the fact that you are calling everyone else idiots who don't have any data, the fact that what you write has (a) no data to back it up and (b) is usually trivially, obviously and comically wrong doesn't exactly help your argument.

> ...will idiom recognize division algorithms and change them

Really? Which ones? How is this useful? How many real-world programs actually code a division "algorithm"? I don't think I've seen that once in the last 20+ years or so, but of course YMMV.

UPDATE: Last I remember, "idiom recognition" was invented for the sole purpose of cheating on industry standard benchmarks, for example replacing the Dhrystone with a computed result. Has this changed?

As a very simple example see if you can spot the loop in the optimised version of a sum to n function[0] (hint: you can't). I don't know to what extent this scales up to more complex programs, but it is at least possible in principle.

[0]: https://godbolt.org/g/dRUYA3

It does not work very well:

> https://godbolt.org/g/qmQhLK

Exercise: Write an assembly function that is faster than "-O3" and loop-free. Should be easy.

UPDATE: For those who don't want to do the exercise: The Intel compiler (icc) optimizes it in about the way that I would if I were to write it in assembly:

> https://godbolt.org/g/LntTnY

Which is slower on non-Intel and older CPUs. Add -march=native next time. Nets you a vector version. If you disable sse it generates essentially what icc does with less reliance on ucode.

See, rep is a workaround for older AMD CPU branch predictor. In addition gcc does not use the highly opcoded variant of lea with offsets because it is slow on older Intel.

GCC version makes for much better speculative execution too due to the use of the adder.

> Which is slower on non-Intel and older CPUs. Add -march=native next time. Nets you a vector version.

Accepted (though the central advantage of SHLX (with-march=native) and SHL (no -march=native) for this example lies in the greater flexibility of register parameters). To my defense I only have a computer whose processor supports BMI2 for a few months now - so I could not play around with BMI2 before. Otherwise I am sure I would have known such tricks.

> See, rep is a workaround for older AMD CPU branch predictor. In addition gcc does not use the highly opcoded variant of lea with offsets because it is slow on older Intel.

I know that. My personal code philosophy is to avoid such hacks for circumventing performance bugs in outdated processors.

By outdated you mean 5 year old? This is not for ancient Athlons. Those were 64 bit chips already.

Likewise preferring microcoded lea shafts everything older than Haswell on Intel side. Not to mention modern Atom.

Yes but the point was that compilers do changes algorithms.