| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vidarh 2484 days ago

I don't know about Objective-C specifically, but generally one way to achieve this in most dynamic languages is a speed/memory tradeoff.

In my Ruby compiler project which has to deal with the same level of dynamic behaviour, I handle dynamic overriding of methods with C++-style vtables, which turns method calls into the equivalent of this C-ish pseudo-code:

   (* ob->vtable[some_method_offset])(args...)

Since Ruby classes can have a method_missing handling undefined methods, for any class that doesn't implement a given method, the vtable contains a pointer to a thunk that tweaks the stack to push the relevant method symbol as the first argument and then does the same as above with the method offset of method_missing.

Since Ruby classes can have methods overridden at any time, if class Bar inherits from class Foo, inherits from class Object, and I override a method in class Object, that again has previously been explicitly overridden in class Bar, this would happen:

    - Store pointer to method in Object in ptr.
    - Replace method pointer in Object's vtable.
    - Iterate over all direct sub-classes of Object (but here we only care about Foo)
    - Compare the same method offset against ptr. Since Foo has not overridden the method, it matches.
    - Replace the pointer in Foo, and iterate over all direct sub-classes of Foo (but here we only care about Bar)
    - Compare the method offset against ptr. Since Bar has overridden the method, it doesn't match, so leave it alone.

This means that as long as method overrides doesn't happen extremely frequently, the cost of method overrides is relatively low: iterate over all the descendant classes of the class you override a method in.

Method calls on the other hand are about as cheap as virtual method calls in C++, except when you hit method_missing where this approach gives you a very low extra overhead of tweaking the stack to add the symbol and jumping to the method_missing implementation.

This overall approach works for most dynamic languages. The caveat is memory - if you have an application with very large class hierarchies in a language where they are all singly rooted (as in Ruby where they all ultimately inherit from SimpleObject), each vtable will cost you at last pointer_size*global_number_of_method_names. In practice so far I've not seen all that many cases where this is a problem, and it's always possible for the compiler to set a roof above which it will resort to a slow send mechanism (because you'll need to support that anyway in any language that allows dynamic send mechanics; e.g. in Ruby you can always send a message to an object by a dynamically obtained symbol so you still need the equivalent of objc_msgSend as well).

A slightly cheaper approach in terms of memory was described by Michael Franz[1]. His approach was to group methods in interfaces, so instead of a vtable of method pointers, you had a vtable of pointers to interfaces with pointers to methods. You save memory as most classes would typically implement most or none of the methods of an interface; it provides potential namespacing of the methods if you want to do that, and you can cut memory further by re-using the same vtable for an interface until someone tries to override at least one method in it. The cost is one extra indirection at call-sites.

[1] Protocol Extension: A Technique for Structuring Large Extensible SoftwareSystems (1994) https://pdfs.semanticscholar.org/60d3/045d924dfb912e184014aa...

1 comments

gok 2484 days ago

Generally impractical for real-world ObjC, which has around 100,000 method names.

Apple ObjC used to have a simplified version of this which used a vtable for the most frequent 16 selectors. It wasn't considered profitable after sufficient optimizations on the hash-table based method cache.

vidarh 2483 days ago

Apart from sounding absolutely crazy (not doubting you; I've seen how verbose Objective-C can get), that sounds like the method names almost certainly will consists of many sets of names that are likely only used by small sets of classes, in which case the extra indirection in Franz' approach should work just fine, and not require any caching.

What I saw when looking at this before choosing to go that route years ago is that most dynamic language implementations seems to have just rejected vtable based approaches out of hand or never considered them at all because it's become seen as a "write once" approach unsuitable for updates and the default assumption has become that a complex lookup is needed.

It's been a few years since I looked, but last time I looked Franz' paper was the only one I found that investigated dynamic changes to objects at runtime using a vtable-like approach at all, and ironically he did so with a statically typed language... It seems like a curious blindness to me - maybe it genuinely is unsuitable for Objective-C, but most dynamic dispatch mechanisms I've looked at have been in environments where the number of names being looked up tends to be too small for even Franz' approach to be necessary.

(For Ruby the method name count tends to remain quite small, to the point where Franz' approach doesn't seem worth the cost of that extra indirection most of the time)