|
|
|
|
|
by Taniwha
4585 days ago
|
|
I worked on serious x86 clone once - we took a lot of real-world trace and ran it through our various microarchitectures to see how it would fly - dynamic C++ dispatch was interesting normally you expect something like mov r1, n(bp) ; get vtable
mov r2, n(r2) ; get method pointer
call (r2) ; call
that's a really bad pipe break a double indirect load and a call - but branch prediction may be your friend ...However some of the code we saw (I think it came from a Borland compiler) mov r1, n(bp) ; get vtable
push n(r2) ; get method pointer
ret ; call
an extra memory write/read but always caught in L1 and on the register poor x86 it saves a register right> ... but on most CPUs of the time you're screwed for the branch prediction - CPUs had a return cache, a cheap way to predict the branch target of a return - by doing a return without a call you've popped the return cache leaving it in a bad state - EVERY return in an enclosing method is going to mispredict as well - the code will run, but slowly |
|