Hacker News new | ask | show | jobs
by bakul 1033 days ago
Processor "optimizations" can produce surprising effects. The problem is these optimizations are not programmatically accessible to C (or most modern programming languages) given their simple memory model. Deterministic performance is not easy to obtain. My view is to not bother with such tricks unless absolutely necessary (and be prepared that your changes may actually pessimize performance on a future processor or a compatible processor by a different vendor).

If you are interested in this sort of thing, check out comp.arch!

2 comments

I've been researching this pretty deeply for the last few years, and I've come to the conclusion that, without a complete redesign, most popular programming languages cannot have direct control of these optimizations in an ergonomic manner.

The reasonI think this is because: Most languages target C or LLVM, and C and LLVM have a fundamentally lossy compilation processes.

To get around this, you'd need a hodge podge of pre compiler directives, or take a completely different approach.

I found a cool project that uses a "Tower of IRs" that can restablish source to binary provenance, which, seems to me, to be on the right track:

https://github.com/trailofbits/vast

I'd definitely like to see the compilation processes be more transparent and easy to work with.

I would agree, but it's hard to argue with a factor 2 performance boost.

But these kind of tricks feel like we need to con the compiler into optimising this correctly, which is of course ridiculous. What we probably need instead is if-statements that we can tell what's most likely the correct prediction.

Something like:

  if v > maxV predict true
    maxV = v
    continue
It's only factor 2 with an increasing array though. At which point you can just take the last element, that's way faster.

So really you end up having to make assumptions about the input to get the performance boost.

The text does point out that the branch is somewhat predictable even for a random array. In that case, the odds of having seen the array-maximum increase as you scan through the array. For example, on a random array I would predict the first iteration to take the branch 1/2 of the time, but the last iteration to take the branch only 1/N of the time.

The CPU's branch predictor won't be able to perform that kind of algorithmic analysis, but patterns like the above also work reasonably well for simpler heuristics like 'predict the same outcome as the last time the branch was taken'.