Hacker News new | ask | show | jobs
by alextingle 4586 days ago

    for (unsigned i = 0; i < N; ++i) {
      for (unsigned j = 0; j < i; ++j) {
        obj->tick(j);
      }
    }
I wouldn't go quite so far as to say that benchmarks with tight inner loops like this are completely useless, but they are nearly so.

The author is clearly aware that the real world of performance is much bigger & more complex than his simple Petri dish. Credit to him for mentioning that. It's also really refreshing to see him analysing the optimised assembly.

The trouble with this approach is that it's tempting to draw simple conclusions. In this case, you might be tempted to conclude "CRTP always faster than virtual dispatch", when the truth is likely to be much more situation dependent.

I have seen a biggish project go though a lot of effort to switch to CRTP, only to see a negligible performance impact.

2 comments

And I have seen projects whose performance was crippled by layers upon layers of endless virtual calls. YMMV ;-)
Agreed, for almost all code it doesn't matter, but for the remaining small fraction it's worth thinking about these things. It sounds pretty insane to go with a blanket approach of removing virtual calls throughout an entire codebase without understanding which ones are the problematic ones. Especially since some ways of solving the problem could potentially lead to other problems like increased compiled code size.

I've seen plenty of software (especially systems software) that does spend much of it's time in tight inner loops. Pulling out all the optimization stops there can give measurable gains. I've personally seen measurable gains on real applications from tricks like reordering branches so that the more predictable branches go first.

Sure, it's a waste to optimize code that doesn't significantly contribute to execution time. And there are lots of cases that are I/O bound, memory bound, cache bound, cross-thread communication bound etc. But if you're doing actual calculations - so not lots of communication like I/O or threading, and not delegating the crunching to a library - and your calculations are not trivial bit-pushing (e.g. not just streaming with minor changes), then it's a good bet that any virtual function calls in that kind of code will be problematic; getting rid of the inner-loop dynamic dispatches will almost certainly help.

So it's situational, but IME it's pretty predictable where you'll see this kind of optimization help. By all means profile and use whatever tools at hand to help you along, and don't apply the optimization blindly - but despite the whole "black art" label optimization sometimes gets this kind of thing really is pretty straightforward.

Yes, of course. But the real challenge is to determine that it's your virtual indirection that's causing the problem. Once you know what's causing the problem, fixing it is (relatively) easy.

The danger is that benchmarks like this encourage naïve programmers to use complex constructs as a matter of course, when simpler would usually be better. "Premature optimisation is the root of all evil" and all that...

I have never heard of any project where virtual calls are the dominant factor in performance.

Are there any open source projects amongst your examples?

It's not necessarily done with explicit 'virtual' calls in C++, but speed of dispatch is very important to interpreters for dynamic languages. Here's an improvement to Python where the same general type of fixes make a significant difference: http://bugs.python.org/issue4753
Good example.
Also, CRTP prevents you from storing all derived objects in a single container, since the underlying types are now heterogeneous. There are also more restrictions present in terms of slicing, casting, among other things that render CRTP a poor choice in many situations.

In the end, it's just another tool which is the right one in particular circumstances, and the wrong one in all others.