Hacker News new | ask | show | jobs
by pslam 4585 days ago
If the branch target is an address loaded from memory, and there is no cached result for the branch instruction, then there's no way it can predict which instruction to execute next. The target could be anywhere in valid memory.

The reason the measurements don't show it is the micro-benchmark will be predicting very well. In fact it's quite difficult to defeat prediction even for giant codebases, and you probably have bigger issues with L1 thrashing at that point. The more subtle problem is even with prediction, there's a (quite high) limit to the number of unretired speculated instructions. Again, a micro-benchmark won't show that up - you'd need a large function in the inner loop.

I'm making it sound like there's no cost to virtual functions in real applications, but it's there, usually measurable and every little adds up. If anything, I think a better reason to not simply spray "virtual" everywhere is it demonstrates that the author didn't understand the data structures they created.

1 comments

On the other hand, having no cached result for the branch probably correlates strongly with not having the target in I-cache, which means you may be stalling out anyway. It also implies that the branch is not in the middle of a tight loop.

Regarding the size of the ROB, I was wondering about the size a while ago and found an interesting post from someone who measured it for modern Intel processors: Ivy Bridge (168), Sandy Bridge (168), Lynnfield (128), Northwood (126), Yorkfield (96), Palermo (72), and Coppermine (40). http://blog.stuffedcow.net/2013/05/measuring-rob-capacity/

I agree with you on the virtual part. I'm actually a C programmer more interested in how to implement efficient dispatch for interpreters. Eli (the original author) has some good posts on that as well: http://eli.thegreenplace.net/2012/07/12/computed-goto-for-ef...