Hacker News new | ask | show | jobs
by kevingadd 4585 days ago
I've seen interface calls be inlined in action on the modern CLR when looking at disassembly. I don't understand why you would have expected the interface call to be faster than a normal non-virtual call? The interface call always needs a type check before the inlined call body in case of polymorphism; it can't be as fast as a normal call.

Or are you saying the interface call is 5x slower than a virtual call? That definitely isn't right.

1 comments

I expect an inlined interface or virtual call to be the same as an inlined non-virtual call. But since the CLR (4.5, Windows 7 x64, using 32 or 64-bit codegen) won't emit an inlined virtual/interface call for int Add(int, int) -- it's slower.

In my simple program doing a loop, calling an Add function on an interface, it is definitely making a function call each time. It unrolls 4 times, and loads the function pointer once per iteration - I'd have though it would only load it once overall. Loop is 89 bytes. There is no conditional inside the loop to check for the type.[1]

If I change it to not use the interface (don't cast to the interface type), it's unrolled and inlined. Loop is 34 bytes.[2]

It's the same on 32-bit, except there's no unrolling. The non-virtual loop body is 2 instructions (inc, add). The interface has a push, 3 movs and a call. The virtual one requires two extra movs (to load the function pointer - with an interface the address is embedded as a literal).

Shrug. Maybe it still doesn't work with value types? I started it without VS then broke in with the debugger to get the disassembly.

The loop is doing "y = x.Add(y, i)" where y is a local.

Edit: Aha! Using an interface method (not virtual) and strings, I was able to get inlining. I guess the CLR is still weak in dealing with value types.

1: Start of the loop using an interface:

  lea r8d,[rdi-1]
  mov rbx,qword ptr [FFEEFE60h]
  mov edx,eax
  mov rcx,rsi ; rsi is the object pointer
  lea r11,[FFEEFE60h] ; I am embarrassed to admit I don't know what r11 is doing
  call rbx ; just does lea eax[rdx+r8], ret
  ; similarly 3 more times then loop
2: Without using the interface, the loop body:

  lea eax,[r8-1] ; r8's the counter
  add ecx,eax   
  lea edx,[rcx+r8]
  lea ecx,[rdx+rax]
  lea eax,[r8+2]
  add ecx,eax
  ; then loop
Nice work digging in and figuring out how to trigger it. I fiddled around some earlier and wasn't able to reproduce the behavior I saw before, so I gave up. :) You are correct that the CLR does a poor job optimizing value types, and I probably made the same mistake (i.e. used a struct)