| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by falcolas 3514 days ago
	The small difference between direct C calls and virtual C++ calls surprised me, actually. I thought it would be much bigger.

4 comments

currysoup_t 3514 days ago

As usual, memory access is the expensive operation. If the virtual function is already in L1 cache a virtual function should be only a minor slowdown. If it's all the way off in main memory it will be significantly slower.

Eric Brumer has a great talk on this: https://channel9.msdn.com/Events/Build/2013/4-329

link

eloff 3514 days ago

Exactly. Calling a virtual function in a tight loop, almost indistinguishable from a direct (non-inlined) function call. Calling a virtual function every now and then, much more expensive, but also not as likely to matter much to the performance of your program on the whole anyway.

Still worth noting that every VM like Java or .NET or LuaJIT will optimize for the case that a virtual call usually has only one or two commonly invoked target functions. They will use a conditional fast-path for the common case(s) and fall back to slower virtual calls or less optimized paths for megamorphic calls. I don't know if any C++ compilers do this kind of thing, but I would be surprised if they did not.

link

twoodfin 3513 days ago

Some C++ compilers support "fast path" devirtualization via profiling feedback:

http://hubicka.blogspot.com/2014/04/devirtualization-in-c-pa...

As you suggest, it's a little misleading to talk about the runtime cost of a virtual or indirect function call in isolation. The most significant cost is often the resulting inability of the compiler to inline and perform further optimizations across the caller and callee.

link

eloff 3513 days ago

Nice, I had a feeling they did that, thanks for the link.

I fully agree, missed optimization opportunity from not being able to inline is usually a bigger effect, because loops dominate the runtime and that's where those optimizations count.

link

exDM69 3513 days ago

Virtual function / function pointer calls can still have a big performance impact, especially if it could be inlined away.

Compare C standard library qsort with C++ std::sort performance for sorting integers. The difference is huge in favor of std::sort. If you write your own qsort and put it in a header file so the indirect calls can be inlined, they should be equal.

link

jheriko 3513 days ago

i've seen virtuals completely removed and inlined by gcc when there is enough information for it to do that, and a simple enough use case... probably other compilers are similar.

link

brigade 3514 days ago

Even then, the article and co-commentators exaggerate the cost and factors you should worry about for virtual calls. Unless you're optimizing specifically for an in-order CPU, the only cost of virtual functions that really matters in the general/high level case is not being able to inline them.

link

a_e_k 3514 days ago

An important factor is whether it can be branch predicted. If the virtual call nearly always goes to the same method implementation (e.g., always an OpenGL renderer or always a DirectX renderer), it can actually be pretty cheap. They get expensive mainly when the call is constantly jumping to implementations for different object types.

link

acchow 3514 days ago

A virtual call in C++ is just a vtable lookup then a direct C call. Literally one pointer away. (or perhaps one pointer then one integer addition away? For the offset)

link

no-bugs 3513 days ago

However, the address of this C call is calculated, which means that branch prediction (and the cost of potential misprediction) is involved. So additional cost we're talking about, is roughly a cost of L1 read (3-4 clocks by itself) - plus (0-<cost-of-branch-misprediction))

link

chrisseaton 3514 days ago

Is it that simple for virtual methods in complex class hierarchies? If the class hierarchy isn't linear, how can you assign simple absolute vtable indices for methods?

link

dom0 3513 days ago

Yes it is. The C++ ABI defines how any permitted inheritance hierarchy is mapped into a surjective, ordered set of vtables. Also, calls into methods of other classes in the hierarchy offset the this pointer accordingly.

It feels counterintuitive that these two mechanisms are sufficient; but they are.

link

jackmott 3514 days ago

one pointer can be as slow as a sqrt if it isn't in cache.

link

gpderetta 3514 days ago

The cost of virtual calls (and calls in general) in the article is greatly exaggerated. Jumps, calls and indirect calls have an effective latency of zero as they are resolved in the fetch stage (and speculated if the target is not known) and a troughput of one taken jump every one or two cycles.

In a virtual call, the offset computation and load is only used to compute the jump target (which is only used later on commit), which means that is not on the critical path [1].

They consume execution and load bandwith, but that's not normally the bottleneck. The cost of virtual cals is usually due to missed optimizations and the possibility of misprediction.

[1] technically if the load is delayed long enough, the CPU might be prevented from committing the speculated instructions and eventually exhaust its reoerder buffer and stall.

link

no-bugs 3513 days ago

> The cost of virtual cals is usually due to missed optimizations and the possibility of misprediction.

Yep (plus one L1 read to get that VMT pointer). And the OP is pretty much consistent with this (at least within declared "indications of the order of magnitude" precision).

link

gpderetta 3513 days ago

No, the load is not on the critical path, so it is not normally part of the overhead.

link