|
|
|
|
|
by nkurz
4585 days ago
|
|
Best case, the core may still block predicted execution shortly after due to running out of non-dependent instructions, until it knows for sure the address it should have branched to. Worst case, the branch can't proceed until the two memory accesses access. You seem very familiar with these issues, but this doesn't sound right to me. Maybe I'm not understanding your terminology, but don't all modern processors support speculative execution? All instructions (including dependent) are executed, but the results are held in the Reorder Buffer until the branch choice is confirmed. If this is still a large issue, why don't Eli's measurements show it to be? |
|
The reason the measurements don't show it is the micro-benchmark will be predicting very well. In fact it's quite difficult to defeat prediction even for giant codebases, and you probably have bigger issues with L1 thrashing at that point. The more subtle problem is even with prediction, there's a (quite high) limit to the number of unretired speculated instructions. Again, a micro-benchmark won't show that up - you'd need a large function in the inner loop.
I'm making it sound like there's no cost to virtual functions in real applications, but it's there, usually measurable and every little adds up. If anything, I think a better reason to not simply spray "virtual" everywhere is it demonstrates that the author didn't understand the data structures they created.