Are there any conditions where the compiler can optimize away the call to objc_msgSend? Or is it always used for any call between 2 ObjC/Swift methods?
In general, no, as message sends are observable and swizzlable so they they must go through the bare minimum of objc_msgSend just to make sure that they can be called from the cache. The only exception to this that I know is the retain/release methods, which are rarely overridden and called extremely often, so they have code that doesn't go through the message sending machinery unless necessary (in the case of a custom implementation). And of course, pure Swift calls don’t use the Objective-C runtime at all so they can be optimized as usual.
Always wondered about this too, but there doesn't seem to be a way of knowing which implementation your code will end up calling in Objective-C. "Final" would be helpful but there's no "final" in the language.
Speculative inlining driven by heuristics or PGO should always be possible. Same thing is done for virtual functions in C++ (or any indirect call really).
Objective-C's OO model is far more dynamic than you may realize. There're no guarantees even on the object reference type at run time, i.e. that the method you are calling will be applied to an object of a specific type or compatible. ObjC is very permissive in this regard, though it can give warnings in certain cases but they are never 100% precise (from my experience anyway). You could run analysis on the entire end product and still be unsure of what's going to happen at run time.
This is also true for languages like Self as well which pioneered inline (and polymorphic) caches in the first place. Self did it using JITs, but you "only" need a sufficient hit rate to make up for the check+branch to justify inlining the most likely option(s) at compile time.
The interesting thing is that by inlining, for the inline case you will often gain additional type information. E.g. to take an example from Ruby, since I don't know Objective C very well. In isolation you have no way of telling what type "foo + 1 + 2 + 3" will return, as it depends entirely on "foo". But lets say most call-sites calling the method where this expression is found passes an integer.
If I can guarantee that "foo + 1" is an Integer addition, then I know it will return an Integer, and so I know the same addition method will be used for the next addition (and by extension the next as well), so I can turn the above into the following Ruby-ish pseudo-code instead:
if foo.is_a?(Integer)
# By recursively inlining, I know not just the type of foo, but the type of the full expression.
inlined foo + 1 + 2 + 3
else
foo.send(:+, 1).send(:+, 2).send(:+, 3)
end
Even when you can't safely inline the actual calls, you can often elide checks or resort to more specialized method caching.
Yes, of course with a JIT you can do much better than an AOT compiler, and for dynamic languages is pretty much required to get reasonable performance.
Well, compared to profile guided optimization as mentioned by the other commenter earlier, that's really only the case if the profile of called methods vary greatly between runs.
The polymorphic inline caching from Self for example is guided by collecting simple stats. Tracing does the same. A JIT ensures those stats are always completely up to date, but nothing stops you from saving it and using it for an AOT compiler as well.
But often even that is overkill, as you can often statically deduce a lot about the types a method is likely to get called with by simply looking at the call sites, and most programs have very static call profiles.
Doesn't matter, at some point there's a indirect function call and the compiler can try to guess the target, inline it and add an address check that, on failure falls back to the slow path.
This actually seems like it’d be quite beneficial, as I’d assume 90+% of method call targets can be statically guessed just by looking at the code (to increase this ratio even more, I’m sure Apple could even ignore Cocoa methods that use the forwarding machinery).
I don't know about Objective-C specifically, but generally one way to achieve this in most dynamic languages is a speed/memory tradeoff.
In my Ruby compiler project which has to deal with the same level of dynamic behaviour, I handle dynamic overriding of methods with C++-style vtables, which turns method calls into the equivalent of this C-ish pseudo-code:
(* ob->vtable[some_method_offset])(args...)
Since Ruby classes can have a method_missing handling undefined methods, for any class that doesn't implement a given method, the vtable contains a pointer to a thunk that tweaks the stack to push the relevant method symbol as the first argument and then does the same as above with the method offset of method_missing.
Since Ruby classes can have methods overridden at any time, if class Bar inherits from class Foo, inherits from class Object, and I override a method in class Object, that again has previously been explicitly overridden in class Bar, this would happen:
- Store pointer to method in Object in ptr.
- Replace method pointer in Object's vtable.
- Iterate over all direct sub-classes of Object (but here we only care about Foo)
- Compare the same method offset against ptr. Since Foo has not overridden the method, it matches.
- Replace the pointer in Foo, and iterate over all direct sub-classes of Foo (but here we only care about Bar)
- Compare the method offset against ptr. Since Bar has overridden the method, it doesn't match, so leave it alone.
This means that as long as method overrides doesn't happen extremely frequently, the cost of method overrides is relatively low: iterate over all the descendant classes of the class you override a method in.
Method calls on the other hand are about as cheap as virtual method calls in C++, except when you hit method_missing where this approach gives you a very low extra overhead of tweaking the stack to add the symbol and jumping to the method_missing implementation.
This overall approach works for most dynamic languages. The caveat is memory - if you have an application with very large class hierarchies in a language where they are all singly rooted (as in Ruby where they all ultimately inherit from SimpleObject), each vtable will cost you at last pointer_size*global_number_of_method_names. In practice so far I've not seen all that many cases where this is a problem, and it's always possible for the compiler to set a roof above which it will resort to a slow send mechanism (because you'll need to support that anyway in any language that allows dynamic send mechanics; e.g. in Ruby you can always send a message to an object by a dynamically obtained symbol so you still need the equivalent of objc_msgSend as well).
A slightly cheaper approach in terms of memory was described by Michael Franz[1]. His approach was to group methods in interfaces, so instead of a vtable of method pointers, you had a vtable of pointers to interfaces with pointers to methods. You save memory as most classes would typically implement most or none of the methods of an interface; it provides potential namespacing of the methods if you want to do that, and you can cut memory further by re-using the same vtable for an interface until someone tries to override at least one method in it. The cost is one extra indirection at call-sites.
Generally impractical for real-world ObjC, which has around 100,000 method names.
Apple ObjC used to have a simplified version of this which used a vtable for the most frequent 16 selectors. It wasn't considered profitable after sufficient optimizations on the hash-table based method cache.
Apart from sounding absolutely crazy (not doubting you; I've seen how verbose Objective-C can get),
that sounds like the method names almost certainly will consists of many sets of names that are likely only used by small sets of classes, in which case the extra indirection in Franz' approach should work just fine, and not require any caching.
What I saw when looking at this before choosing to go that route years ago is that most dynamic language implementations seems to have just rejected vtable based approaches out of hand or never considered them at all because it's become seen as a "write once" approach unsuitable for updates and the default assumption has become that a complex lookup is needed.
It's been a few years since I looked, but last time I looked Franz' paper was the only one I found that investigated dynamic changes to objects at runtime using a vtable-like approach at all, and ironically he did so with a statically typed language... It seems like a curious blindness to me - maybe it genuinely is unsuitable for Objective-C, but most dynamic dispatch mechanisms I've looked at have been in environments where the number of names being looked up tends to be too small for even Franz' approach to be necessary.
(For Ruby the method name count tends to remain quite small, to the point where Franz' approach doesn't seem worth the cost of that extra indirection most of the time)