Hacker News new | ask | show | jobs
by ryao 302 days ago
> The article describes how the Linux kernel, despite being written in C, embraces object-oriented principles by using function pointers in structures to achieve polymorphism.

This technique predates object oriented programming. It is called an abstract data type or data abstraction. A key difference between data abstraction and object oriented programming is that you can leave functions unimplemented in your abstract data type while OOP requires that the functions always be implemented.

The sanest way to have optional functions in object oriented programming that occurs to me would be to have an additional class for each optional function and inherit each one you implement alongside your base class via multiple inheritance. Then you would need to check at runtime whether the object is an instance of the additional class before using an optional function. With an abstract data type, you would just be do a simple NULL check to see if the function pointer is present before using it.

7 comments

In Smalltalk and Objective-C, you just check at runtime whether an object instance responds to a message. This is the original OOP way.

It's sad that OOP was corrupted by the excessively class-centric C++ and Java design patterns.

> In Smalltalk and Objective-C, you just check at runtime whether an object instance responds to a message. This is the original OOP way.

This introduces performance issues larger than the typical ones associated with vtable lookups. Not all domains can afford this today and even fewer in the 80s/90s when these languages were first designed.

> It's sad that OOP was corrupted by the excessively class-centric C++ and Java design patterns.

Both Smalltalk and Objective-C are class based and messages are single-receiver dispatched. So it’s not classes that you’re objecting to. It’s compile-time resolved (eg: vtable) method dispatch vs a more dynamic dispatch with messages.

Ruby, Python, and Javascript all allow for last-resort attribute/message dispatching in various ways: Ruby via `method_missing`, Python by supplying `__getattr__`, and Javascript via Proxy objects.

> This introduces performance issues larger than the typical ones associated with vtable lookups.

Don't know about other programming languages but with Objective-C due to IMP caching the performance is close to C++ vtable

  Name Iterations Total time (sec) Time per (ns)
  C++ virtual method call 1000000000 1.5 1.5
  IMP-cached message send 1000000000 1.6 1.6

https://mikeash.com/pyblog/friday-qa-2016-04-15-performance-...
In NeXTSTEP it was fast enough to have the whole OS, including device drivers, written in Objective-C.

In Smalltalk systems that stop being an issue after JITs got introduced.

When you fear about the branch miss, then you can also just call the method and catch the SIGSEGV. I think if you allow for the possibility of there being no implementation, then you can't really not have some decision there. This would also apply for say a C++ virtual method.
A SIGSEGV is not guaranteed when calling an address that does not have a function.
Are you talking about C? In C it's not guaranteed whether a non existing function will result in an actual function call. As soon as an actual function call was generated by the compiler, a modern CPU is very likely to trap.

We are talking about an optimization of a language implementation here. This would be very much written in a ASM or another language were this is defined.

Actually I would say that it is sad that developers learn a specific way how a technology is done in language XYZ and then use it as template everywhere else, what happened to the curiosity of learning?
The curiosity of learning is infeasible given that there are >15,000 programming languages. You might say to only learn the major/influential languages, but I have found my curiosity to wane with every additional language that I learn. So far, I have either used or dabbled in C, C++, FORTRAN, SML/NJ, PHP, Java, JavaScript, Python, Go, POSIX Shell, Assembly and SQL stored procedures. That is not to mention different dialects/versions of those, and the fact that assembly itself is a catch all category in which I have dabbled in multiple languages too. I am also not including languages for which I spent a minuscule amount of time (say a few hours), such as D (as in DTrace), Objective-C, Perl, COBOL and LISP. Then there is the misadventure into C# I took before I actually understood programming where I got stuck and dropped it. I am also not sure if I should mention AWK, as I only use it to select the Nth field in a bunch of lines of text when doing shell scripting and nothing else. Thus, I have used it many times, yet know almost none of its syntax.

Every few years, someone tells me I should learn another language, and in recent years, there just is no desire in my mind to want to learn yet another language that is merely another way of doing something that I already can do elsewhere and the only way I will is if I am forced (that is how I used Go).

That said, I do see what you are saying. C++ for example has an “support every paradigm” philosophy, so whenever someone who has learned C++ encounters a language using a paradigm that C++ assimilated, there is a huge temptation to try to view it through the lens of C++. I also can see the other side too: “C++ took me forever to learn. Why go through that again when I can use C++ as a shortcut to understand something else?”. C++ is essentially the Borg of programming languages.

How do you got proficient in so many languages? I think it takes some years, before you start to think in a language.
I did not know they say the same thing about programming languages.

Based on his comment, I did not think that he is proficient in them, but that he has used them, which is fair enough, so have I, sans all the ones tied to either Apple (Swift) or Microsoft (C#).

I have some projects in Haskell just for curiosity's sake, and because what I wanted seemed like it would be nice in Haskell, and it indeed looks quite elegant to me, for this one particular project. Haskell is not a language I would use generally. OCaml is.

You are correct. I am only highly proficient in a few of them. The others are ones that I have used for varying sized projects for varying reasons, but I only learned the subsets I needed for my purposes. I would say my ability in the others in my main list varies within the low to intermediate range of proficiency.
My proficiency in them varies. I have used them for projects, such that I can write code in them (provided I have references to read), but when I do, I only learn/use the subset of the language that I need. If I want to read code others have written, then I need to know the subset that they used and that is not always the subset that I know. Most of them are languages that I have used for a handful of projects or in the case of assembly, for very small portions of projects, with at least half of that spend just trying to understand how good of a job the compiler did in critical loops.

There are some languages in which I am extremely proficient. My best language is C, which is my favorite and I have used most features of every version of C from C89 to C11. My second best is probably either C++ or POSIX shell (although I have moments where I forget certain syntax and need to look it up, especially in POSIX shell for variants on variable substitution, e.g. ${VAR%%foobar}). I have used most features of C++98 and some from newer versions. My experiences with C++ have soured me on it, so I now try to avoid C++ whenever I can in favor of C and Python.

My first language was actually PHP 4.2.y, and I was fairly proficient in it, having spent a long time learning it while simultaneously writing my own code for a website as a teenager. However, I never once touched the portions describing objects/classes, namespaces or exceptions. Someone else at work writes Modern PHP code using Symfony and I have taken a peek at it. It looks very different from the PHP I knew because it uses the features I had avoided learning (and probably some new language features too), although I can sort of read it thanks to having learned those concepts in other languages.

I used SML/NJ and Java in college. Years after college, I modified an open source Android TV application written in Java to add some things I wanted, although honestly, beyond that I have not really touched either language. Give me an arbitrary application written in either to improve and I would have some difficulty, although I will probably be able to do it after filling the gaps in my understanding (and doing plenty of head banging if it is a large program).

I have used JavaScript for a few recent projects via electron/nodejs and I have done several small things in Python over the past several years. Each time, I only worked with the subset that I need. I am far from a master of either language that can understand arbitrary code written in them, but I am able to manage as far as my specific needs are concerned.

I could continue listing my experiences doing things in languages (like the time in college that I wrote some basic programs in FORTRAN 90 to try to learn it), but they really are not that interesting. It was often a project here or a small application there, and as I readily admitted, I only used a subset of most of the languages. For programming, a subset of commonly used bits is often all you need.

Java is excessively class centric. C++ often is, but it need not be and developers are movingiaway.

smalltalk is not original OO - c++ took oo from simula which was always a different system.

This is a very good point. A greater distinction needs to be made about class-based versus class-free OOP. Many of the problems or constraints around OOP that people don't like, come back to classes.

Often class-based programming is confused as being the only style of OOP, superior to all other styles, or heavy-handedly pushed on others. Many programmers are perfectly fine with using objects or only specific features of OOP, without classes, if they are "allowed" to.

Wait, so in obj-c, could you also write some kijdnof doesnotunderstand method to achieve some dynamic method dispatch?
Yes, that is how microservices were implemented in the days of NeXTSTEP. with PDO.

https://en.wikipedia.org/wiki/Portable_Distributed_Objects

It's how many things are implemented, like UI message routing, IPC, undo, test mocks.
So why did Swift become a thing? Or does Swift has this too?
Because Objective-C being based on C, means it would never be safe while remaining compatible with C.

Also dynamic runtime dispatch Smalltalk style can never be as fast as the VMT based dispatch, or compile time dispatch via generics, even with all the optimizations in place, that objc_msgSend() has had during its lifetime.

Still, Metal is implemented in Objective-C, so there is that.

Objective-C already has a different compiler I fail to see how the compiler is restricted in the checks it does.
There is always a subset of the population for which it is fashionable to try to make a new language to make things easier. It is inevitable that someone in a sufficiently large organization will try to make one, provided management supports it. Apple is a very wealthy company that is happy to sponsor R&D, had someone interested in this that they sponsored and the result was appealing enough that they decided to adopt it.

That is my understanding of how the process generally works and what I am willing to guess happened. Prior to this, they had been making incremental changes to Objective-C.

That said, from what I have seen of the syntax of both languages, swift’s syntax is nicer and that is not something that they would have been able to get from Objective-C. They already had tried syntax reform for Objective-C once in the past and abandoned it.

Because Objective-C does not have foo.bar() style method calls and that's what everybody else uses and wants.
Swift isn't messaging-based, it's protocol-based. (Except it's also messaging-based if you use @objc.)
Because Swift adds many other features.
Sure, but that is a a horrible coding pattern. Even good smalltalk code doesn't rely on this. It's dogshit and lazy
I largely agree, and use these patterns in C, but you’re neglecting the usual approach of having a default or stub implementation in the base for classic OOP. There’s also the option of using interfaces in more modern OOP or concept-style languages where you can cast to an interface type to only require the subset of the API you actually need to call. Go is a good example of this, in fact doing the lookup at runtime from effectively a table of function pointers like this.
My point is that this pattern is not object oriented programming. As for a default behavior with it, you usually would do that by either always adding the default pointer when creating the structure or calling the default whenever the pointer is NULL.

In the Linux VFS for example, there are optimized functions for reading and writing, but if those are not implemented, a fallback to unoptimized functions is done at the call sites. Both sets are function pointers and you only need to implement one if I recall correctly.

To be fair, OOP is not 100% absolutely perfectly defined. Strustrup swears C++ is OOP, Alan Key, at least at some point laughed at C++, and people using CLOS have yet another definition
You forgot about people using BETA, or Self, or ......
> My point is that this pattern is not object oriented programming.

I think the "is/is not" question is not so clear. If you think of "is" as a whether there's a homomorphism, then it makes sense to say that it is OOP, but it can qualify as being something else too, ie. it's not an exclusionary relationaship.

Object oriented programming implies certain contracts that the compiler enforces that are not enforced with data abstraction. Given that object oriented programming and data abstraction two live side by side in C++, we can spot the differences between member functions that have contracts enforced, and members function pointers that do not. Member functions have an implicit this pointer, and in a derived class, can call the base class version via a shorthand notation to the compiler (BaseClass::func() or super()), unless that base class version is a pure virtual function. Member function pointers have no implicit this pointer unless one is explicitly passed. They have no ability to access a base class variant via some shorthand notation to the compiler because the compiler has no contract saying that OOP is being done and there is a base class version of this function. Finally, classes with unimplemented member functions may not be instantiated as objects, while classes with unimplemented member functions pointers may.

If you think of the differences as being OOP implies contracts with the compiler and data abstraction does not (beyond a simple structure saying where the members are in memory), it becomes easier to see the two as different things.

So you can opt out or in to syntactic sugar, that makes C++ an interesting and useful language, but how you implement OOP, doesn't really affect if it is OOP.
By this logic, C is an objective oriented language. It is widely held to not be. That is why there were two separate approaches to extend it to make it object oriented, C++ and Objective-C.
> Object oriented programming implies certain contracts that the compiler enforces

Sorry, but where did you got this definition from? I've always thought OOP as a way of organizing your data and your code, sometimes supported by language-specific constructs, but not necessarily.

Can you organize your data into lists, trees, and hashmaps even if your language does not have those as native types? So you can think in a OO way even if the language has no notion of objects, methods, etc.

> Sorry, but where did you got this definition from?

It is from experience with object oriented languages (mainly C++ and Java). Technically, you can do everything manually, but that involves shoehorning things into the OO paradigm that do not naturally fit, like the article author did when he claimed struct file_operations was a vtable when it has ->check_flags(), which would be equivalent to a static member function in C++. That is never in a vtable.

If Al Viro were trying to restrict himself to object oriented programming, he would need to remove function pointers to what are effectively the equivalent of static member functions in C++ to turn it into a proper vtable, and handle accesses to that function through the “class”, rather than the “object”.

Of course, since he is not doing object oriented programming, placing pointers to what would be virtual member functions and static member functions into the same structure is fine. There will never be a use case where you want to inherit from a filesystem implementation’s struct file_operations, so there is no need for the decoupling that object oriented programming forces.

> I've always thought OOP as a way of organizing your data and your code, sometimes supported by language-specific constructs, but not necessarily.

It certainly can be, but it is not the only way.

> Can you organize your data into lists, trees, and hashmaps even if your language does not have those as native types?

This is an odd question. First, exactly what is a native type? If you mean primitive types, then yes. Even C++ does that. If you mean standard library compound types, again, yes. The C++ STL started as a third party library at SGI before becoming part of the C++ standard. If you mean types that you can define, then probably not without a bunch of pain, as then we are going back to the dark days of manually remembering offsets as people had to do in assembly language, although it is technically possible to do in both C and C++.

What you are asking seems to be exactly what data abstraction is, which involves making an interface that separates use and implementation, allowing different data structures to be used to organize data using the same interface. As per Wikipedia:

> For example, one could define an abstract data type called lookup table which uniquely associates keys with values, and in which values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree, or even a simple linear list of (key:value) pairs. As far as client code is concerned, the abstract properties of the type are the same in each case.

https://en.wikipedia.org/wiki/Abstraction_(computer_science)...

Getting back to doing data structures without object oriented programming, this is often done in C using a structure definition and the CPP (C PreProcessor) via intrusive data structures. Those break encapsulation, but are great for performance since they can coalesce memory allocations and reduce pointer indirections for objects indexed by multiple structures. They also are extremely beneficial for debugging, since you can see all data structures indexing the object. Here are some of the more common examples:

https://github.com/openbsd/src/blob/master/sys/sys/queue.h

https://github.com/openbsd/src/blob/master/sys/sys/tree.h

sys/queue.h is actually part of the POSIX standard, while sys/tree.h never achieved standardization. You will find a number of libraries that implement trees like libuutil on Solaris/Illumos, glib on GNU, sys/tree.h on BSD, and others. The implementations are portable to other platforms, so you can pick the one you want and use it.

As for “hash maps” or hash tables, those tend to be more purpose built in practice to fit the data from what I have seen. However, generic implementations exist:

https://stackoverflow.com/questions/6118539/why-are-there-no...

That said, anyone using hash tables at scale should pay very close attention to how their hash function distributes keys to ensure it is as close to uniformly random as possible, or you are going to have a bad time. Most other applications would be fine using binary search trees. It probably is not a good idea to use hash tables with user controlled keys from a security perspective, since then a guy named Bob can pick keys that cause collisions to slow everything down in a DoS attack. An upgrade from binary search trees that does not risk issues from hash function collisions would be B-trees.

By the way, B-trees are containers and cannot function as intrusive data structures, so you give up some convenience when debugging if you use B-Trees.

> My point is that this pattern is not object oriented programming.

Isn't this exactly how most (every?) OOP language implements it? You would say a C++ virtual method isn't OOP?

Member function pointers and member functions in C++ are two different things. Member function pointers are not OOP. They are data abstraction.

The entire point of OOP is to make contracts with the compiler that forcibly tie certain things together that are not tied together with data abstraction. Member functions are subject to inheritance and polymorphism. Member function pointers are not. Changing the type of your class will never magically change the contents of a member function pointer, but it will change the constants of a non-virtual member function. A member function will have a this pointer to refer to the class. A member function pointer does not unless you explicitly add one (named something other than this in C++).

Yeah, but the compiler implements these by adding vtables, propagating vtables values across inheritance hierarchies, adding another parameter.

You claim when the compiler does this, it's OOP, but when I do it, it's not?

Ìf you do it, it can still be OOP, its just not in an OO language. People have trouble separating using a paradigm and using a language focused on the paradigm, for some reason.
The entire point of OOP in every OOP language that I have ever used has been to have the language try to constrain what you can do by pushing restrictions on syntactic sugar involving objects, inheritance and encapsulation, so I would say yes. The marketing claims that people will be more productive at programming by using these.
> This technique predates object oriented programming.

I would rather say that OOP is a formalization of predating patterns and paradigma.

OOP cannot be a formalization of what predated it because the predating patterns support things that OOP explicitly disallows, like instantiation with unimplemented functions. That is extremely useful when you want to implement an optional function, or mutually exclusive functions such that you pick which is optional. This should be the case in the Linux VFS with ->read() and ->read_iter(). Also, ASTs were formalized after OOP, despite existing prior to it in Lisp.

For full disclosure, I have never verified that leaving ->read() unimplemented when ->read_iter() is implemented is safe, but I have seen enough examples of code that I strongly suspect it is and if it is not, it is probably a bug.

OOP does not disallow instantiation with unimplemented functions, it's just an artefact of implementation in some languages.
OOP only disallows inheritance with unimplemented functions when it's a contract violation.

So that is to say, if the base class has a certain function which must be implemented and must provide certain behaviors, then the derived class must implement that function and provide all those behaviors.

The POSIX functions like read and write do not have a contract which says that all implementations of them must successfully transfer data. Being unimplemented (e.g returning -1 with errno EOPNOTSUPP or whatever) is allowed in the contract.

OOP just wants a derived thing to obey the contract of the abtraction it is inheriting, so if you want certain liberties, you have to push them into the contract.

I would call returning something being implemented as a stub rather than being unimplemented. When something is unimplemented and you try to call it, you crash due to a NULL/invalid pointer dereference, not get an error back. Of course, as far as getting things done is concerned, the two might as well be the same, but as far as how the language works, the two are different.
That's just it; in the POSIX world, the library functions like read and write are just a facade. They call something in a kernel, and that something examines the file descriptor and dispatches a lower level function. It's possible that that is literally unimplemented: as in a null function pointer in some structure. The facade converts that to a -1 return with an approprriate `errno` like EOPNOTSUPP (operation not supported).
Crashing is optional, depending on error model of the language. C has pitiful error model, thus you'll usually end up jumping to 0... but I recall at least one C environment where that gave you an error back instead of crash.

As far as OOP is concerned, lack of implementation is not an issue in instantiating something - an object will just not understand the message, possibly catastrophically.

Having a stub or having a NULL pointer are two ways to leave something unimplemented. Which you use is an implementation detail. You can also use some other sigil instead of NULL.
That sounds like a case of the Liskov substitution principle.
Yes it is, but the point is that without the contract being defined, we can't apply the principle; the details of the interface contract determine what it means to be substitutable.

The LSP is never absolute in a practical system. Because why would you have, say, in a casee of inheritance, a Y which is an new kind of X, if all it did was substitute for X, and behave exactly the same way? Just use X and don't create Y; why create a substitute that is perfectly identical.

If there is a reason to introduce a new type Y which can be used where X is currently used, then it means they are not substitutable. The new situations need a Y and not X, and some situations don't need a Y and stay with X.

In a graphics program, an ellipse and rectangle are substitutable in that they have a draw() method and others, so that they plug into the framework.

But they are not substitutable in the user's design; where the user needs an ellipse, a rectangle won't do, and vice versa.

In that case the contract only cares about the mechanics of the program: making multiple shapes available in a uniform way, with a program organization that makes it easy to code a new shape.

The substitution principle serves the program organization, and not much more.

So with the above discussion in place, we can make an analogy: an ellipse object in a vector graphics program can easily be regarded as a version of a rectangle with unimplemented corners.

The user not onnly doesn't mind that the corners are not implemented, but doesn't want them because they wouldn't make sense.

In the same way, it doesn't make sense to have lseek() on a pipe, or accept() on TTY descriptor or write() on a file which is read only to the caller, etc.

> like instantiation with unimplemented functions

I think this is more of an effect of C distinguishing between allocating memory (aka object creation) and initialization, which other languages disallow for other reasons, not because there are not OOPy enough.

Unlike OOLs, C does not enforce contracts around structures (objects). You are free to do literally anything you want to them.
Including implementing Abstract Data Types, thus everything on the structs is only accessible via functions that enforce the contracts.
The concept of abstract data type is a real idea in the days of compiler design. You might as well say "compiler design predates object oriented programming". The technique described in the lead is used to implement object-oriented programming structures, just as it says. So are lots of compiler design features under the hood.

source- I wrote a windowing framework for MacOS using this pattern and others, in C with MetroWerks at the time.

Compiler design does predate object oriented programming. The first compiler was made by John Backus et al at IBM in April 1957.

As for abstract data types, they originated in Lisp, which also predates object oriented programming.

Actually, no.

"AN ALGORITHMIC THEORY OF LANGUAGE", 1962

https://apps.dtic.mil/sti/tr/pdf/AD0296998.pdf

In this paper they are known as plexes, eventually ML and CLU will show similar approaches as well.

Only much latter would Lisps evolve from plain lists and cons cells.

You caused me to do some digging. That publication is dated November 1962. The Lisp 1.5 manual’s preface is dated August 17, 1962, which is even older. It describes lambdas and property lists, which seem like they can be used to implement ADTs, although I do not have a Lisp 1.5 interpreter since those are obsolete, so I cannot verify that. Computer history articles claim that Simula, the first object oriented language, was born in May 1962, but was not actually operational until January 1965:

https://history-computer.com/software/simula-guide/

Thus, while I had thought Lisp had ADT concepts before the first OOL existed, now I am not sure. My remark that they originated in Lisp had been said with the intention that I was talking about the first language to have it. The idea that the concept had been described outside of an actual language is tangential to what I had intended to say, which is news to me. Thanks for the link.

Plexes are first mentioned in 1960

https://dl.acm.org/doi/pdf/10.1145/366199.366256

and the paper even starts with a critique of the efficiency of Lisp's approach for representing data with cons pairs (citing McCarthy's paper from the same year).

You might also want to watch Casey's great talk on the history of OOP

https://www.youtube.com/watch?v=wo84LFzx5nI

You can do exactly what was done in C with most OOP languages like Java & C# because you have lambdas now, and lambdas are just function pointers. You can literally assign them to instance variables (or static variables).

(sorry it took more than a decade for Java to catch up and Sun Microsystems originally sued Microsoft for trying to add lambdas to java way back when, and even wrote a white paper insisting that anonymous inner classes are a perfectly good substitute - stop laughing)

Inheritance is not needed when a composite pattern can be used.

class DefaultTask { }

class SpecialTask { }

class UsedItem {

    UsedItem() { _task = new SpecialTask() }
    
    void DoIt() { _task.DoIt() }
}

Is python a OOP language? Self / this / object pointer has to be passed similar to using C style object-oriented / data abstraction.

The interesting thing is, that in the OOP implementation inheritance IS composition of vtables and data. It's really only syntactic sugar, that is sometimes not unambiguous.
This is not quite correct. OOP implementation inheritance involves a kind of "open recursion" (that is, calls to base-class methods can end up dispatching to implementations from a derived class) that is not replicated with pure composition. All method calls, including method calls that originate from code in some class anywhere in the hierarchy, ultimately dispatch through the vtable of whatever object they're called on.
But that's exactly the same you would need to implement manually when you use composition. When constructing, you also need to construct the contained objects, when doing something that should affect a contained object, you need to dispatch it.

When a method is never overridden, it doesn't need to be in the vtable.

That's not what people usually mean by "composition" though. The whole point of "use composition over inheritance" is to avoid that behavior.
I don't see how you can use composition without eventually calling methods of the contained objects. Every method of the outer object either uses an inner object or it doesn't. Yes, using the inner object doesn't always means just delegating to a single call. You maybe would implement the outer by calling an inner objects method multiple times or different methods, but nothing stops you of doing the same thing with a super class. When you don't call to an inner object, it's the same as you adding another method to the subclass, without it being present in the parent class.

I think composition over inheritance is only about being explicit. That's it.

Python doesn't require self to be passed. You need it in method definitions, but not calls.
But you can do it. Actually you can call instance methods of other classes and change the class of an instance in Python like in C, but this dynamism is probably what makes it slow. Also doing that will make the program quite complicated, more so than in C, since Python also abstracts about this.
An abstract data type is a software design pattern.

The difference is that design patterns are a technique where you use features not implemented by the compiler or language, and all the checks have to be done by the developer, manually.

Thus, you are doing part of the work of the compiler.

In assembler, a function call is a design pattern.