Hacker News new | ask | show | jobs
by vowelless 2333 days ago
Very shallow. I prefer the popular stack overflow answer by Mystical:

https://stackoverflow.com/questions/11227809/why-is-processi...

2 comments

A relevant question from there: Is everything I learned about big-O performance irrelevant now?

The right answer is: No, just pay attention to what you are counting. The expensive ops today are pipeline stalls, caused by cache misses of all kinds. Branch prediction failures are a kind of cache miss. Count pipeline stalls instead of (e.g.) additions, comparisons, or swaps, and then all your big-O intuitions still work.

Modern CPUs maintain counters of various causes of pipeline stalls, so you don't need to guess how many you have had.

. . .

There are techniques to transform computations to branchless forms, of which cmov is a special case. A useful primitive, for example, is

  template <typename T>
  bool swap_if(
      bool c, T& a, T& b) {
    T v[2] = {
      std::move(a),
      std::move(b) };
    b = std::move(v[1-c]);
    a = std::move(v[c]);
    return c;
  }
Then the loop body in a partition becomes

  l += swap_if(
    *r < p, *l, *r);
with no branches. (A compiler could easily translate this to two cmov instructions when T is a machine word, though none do, just yet.) This easily doubles the performance of quicksort on modern CPUs.

Curiously, the time taken when this is compiled with g++ is hugely worse if a is assigned before b, for very non-obvious reasons.

I love the explanation, but why did the guy suggest a complicated bit work instead of just doing sum += arr[i] > 128 ? arr[i] : 0 is beyond me.
That would not evoke the behavior he wanted to call attention to.
Theres still a conditional branch in that statement
That depends on the compiler, and I'd expect most to optimize this into a cmov