Hacker News new | ask | show | jobs
by perryizgr8 1035 days ago
Why would an unconditional print have any effect on whether the branch predictor is invoked or not? The if statement is there in both cases, so branch prediction should kick in for both. I didn't find an explanation for this behaviour in the article.
2 comments

The go compiler might have a heuristic where a branch with IO is considered cold compared to a branch without.

In the original, it essentially faces

    If <cond>:
        Mov
    Else:
        Noop
Considering these branches unpredictable, it generates a CMOV.

With

    If <cond>:
        Mov
    Else:
        Print
It now considers the first branch hot and the second cold, and thus branch predication valuable, and generates a branch instead.

Turns out for the use case choice (1) is a misfiring, as the branch is extremely predictable, so all the conditional move does is create an unnecessary data dependency.

It’s not necessarily the wrong choice in general, as for unpredictable branches cmov will usually be a huge gain, they’ll incur a cycle or two of latency but save from 15+ cycles of penalty on a mispredicted branch (which if the prediction only works half the time is an average 7.5 cycles per iteration).

You can find older posts which demonstrate that side of the coin e.g. https://owen.cafe/posts/six-times-faster-than-c/

> Turns out for the use case choice (1) is a misfiring, as the branch is extremely predictable

Happens to be extremely predictable for this data. In general, over all possible inputs, it’s not extremely predictable.

If you assume all inputs are different (not something the compiler can assume, of course) the probability of having to update the max value goes down from 1 for the first iteration to 1/n for the last, so, possibly, the loop should be split into two halves. Go through the start of the sequence assuming the value needs updating more often than not, and switch to one where it doesn’t at some point.

For truly large inputs you could even add heuristics looking at how much room there is above the current maximum (in the limit, if you’ve found MAX_INT, you don’t have to look further)

Sorting programs used to have all kinds of such heuristics (and/or command line arguments) trying to detect whether input data already is mostly sorted/mostly reverse sorted, how many different keys there are, etc. to attempt avoiding hitting worst case behavior, but I think that’s somewhat of a lost art

Thank you. I was struggling to understand what was going on after reading the article. Your explanation is much better.
It has to make a decision to jump over the print sequence or not, whereas without the print it can eliminate the branch with a conditional move and only have to predict the loop variable. It just makes a bad guess as to whether branch prediction or conditional move will be faster, as explained above by tylerhou.