Hacker News new | ask | show | jobs
Dependency Inversion in C Using Function Pointers (ernstsson.net)
48 points by ernstsson 5096 days ago
7 comments

... annnnd, we've reinvented callbacks. They've been around a long time, they're practically primordial. Cavemen were knapping callbacks out of flint in anticipation of the first computers.

I have some rules for these that have served me well:

1. Never hold locks when you're calling someone back. You have NO idea how the caller is going to abuse you.

2. Be prepared to handle recursion (usually with deferral of some kind, or possibly an error), because at some point the callee is going to call you back.

3. Always provide a 'void*' or some other context to be passed along with the function pointer (or the callee is doomed to use a global).

4. Document what the callee can do. For instance, if you're a timer object making "alarm clock" callbacks, forbid callees from taking too much time before they return; assert if they blow it.

Well, not reinventing to be honest, rather redocumenting. There's a lot of cavemen behavior that still needs to be taught to "cave-kids". You're correct, an experienced C programmer should know this, but I do expect more to join the ranks. Good list of rules! Well, actually good to keep in mind even with the original tangled design but when resolving the tangle it actually clarifies the responsibility of this even further (the documentation, making sure there's a void* etc etc).
Agreed. I find articles like this great for passing along to junior engineers to help extend their vocabulary, making our pairing more productive. So, thank you for this nice little example of using dependency injection to remove static dependencies. Which, besides reducing "smell", makes unit testing more tractable.

I do agree with the parent's list--especially the "void*" pointer for passing around context. Unless the injected routine is doing something very simple, some context is almost always required. Providing that along with the function pointer helps avoid globals--and thus avoid unnecessary singletons.

I could see how providing a complete example that illustrates the use of this context might muddy the core focus of your article. Maybe a follow-up article? :) Thanks again for creating teaching material for me.

I'll add: The times that I've left off a void* context, I've always wanted one later. Just put one there. Honest. Don't think about functions without also thinking about their environments.

(In languages with closures, you'd just use a closure. Passing a void* around is C's meatball way of expressing an execution context).

Two problems: perfomance and premature abstraction.

Using function pointers instead of direct calls one drastically reduces the ability of the compiler to optimise function calls. Second, in most architectures calling a function through a function pointer will incur in a noticeable overhead when compared to a simple function call.

But the other problem is premature abstraction. The code is this example is tangled because the "server" has only one "client". In case there were at least one more client to be supported at runtime, it would surely be turned into something similar to what has been suggested. But removing the coupling before you know the similarities and dissimilarities between "clients" can only bring problems.

Obviously all this applies only to internal calls, not stable APIs.

Normally compilers does not do optimization of function calls between files anyway, so in this specific case that wouldn't matter. Within the same file it would've been different of course, bringing us to the second point; Yes, in this simplified example it's really very silly to have two files at all. The code itself isn't bad, just badly abstracted. The best refactoring for this limited problem would of course be to combine the files, but then the description of how to invert dependencies would've been lost. The cost of having examples easy to grasp in a short blog post I guess.
> Normally compilers does not do optimization of function calls between files anyway, so in this specific case that wouldn't matter.

This is why function pointers are poison to the optimizer. There can be a 10-100x perf difference between C qsort and C++ std::sort because the use of that function pointer kills performance but in the C++ case the sort function is a template in a header file and can be inlined and then further optimized.

Thankfully, all major C compilers have at least some link time optimization efforts going on. When link time optimization becomes more widely available, we can finally stop thinking about translation units and use function pointers as much as we like.

So as a summary to the comments above; Function pointers between files isn't bad for performance if we can't do link time optimization. When link time optimization really becomes widely available, it's still not bad? Just the step in the middle that could be affected then? Thinking out loud here, any thoughts on this? I usually opt for structure over performance before profiling proves otherwise, but I still find this to be an interesting topic.
> So as a summary to the comments above; Function pointers between files isn't bad for performance if we can't do link time optimization.

No. Function pointers between files are horrible for performance. Do not use them if you're on the fast path. A function pointer (across translation units) not only destroys the compiler's ability to optimize, it also kills your cpu's instruction cache. You can solve the problem by using inline functions and apply __attribute__((always_inline)) or similar if required. If in doubt, check the assembly output of your compiler to verify that all calls have been inlined.

Link time optimization is the cure. It's not widely available at the moment, but maybe in a few years it will be more important. If you work in a relatively isolated piece of software, it's possible that you can use a compiler with link time optimization and get the perf gains today.

Meanwhile, don't use function pointers across translation units if you know you're on the fast path. If you're optimizing something that is profiled to be slow, look for function pointers because getting rid of them can give a big boost.

> I usually opt for structure over performance before profiling proves otherwise, but I still find this to be an interesting topic.

You should always go for structure over performance except when you should not.

FYI GCC and LLVM both do whole program optimisation these days. Not that this is going to help you with function pointers of course :)
Injection by parameters(setters, constructor and simple params) has been in use much before DI became a fad.

    void qsort(void *base, size_t nmemb, size_t size,
               int(*compar)(const void *, const void *))
Here, `compar` is being injected. When it is needed, it's sweet. But when you start going down Java's way(i.e the way Java frameworks (over)do it), it's irritating. The point of DI is to inject dependencies rather than hard coding them. That doesn't mean you have to use IoC containers, or you have to inject everything.
Curious to hear which Java DI frameworks you dislike. I've used Google Guice before on a large project and I found it to be great: it doesn't get in your way and it makes your code a lot more modular and testable.
You're almost always better off using an ABI and shared libraries than doing this.

That said, there are cases where it's useful; but remember: dependency injection is an anti-pattern that increases code complexity to facilitate run-time configuration and testing.

Without a helper IOC framework (that does the hard work of maintaining singleton instances and injecting the right ones in the right places at the right time) doing this is a lot of work in a large project.

...not saying it's a bad thing. It's totally a good thing, especially making your C code testable.

Just be aware that there are limitations and downsides to this beyond merely the potential speed cost in using pointers and optimization issues mentioned above.

Yes, I agree, it's about facilitating run-time configuration and testing. Not sure I agree with the increasing of code complexity though. Maybe there's an aspect of code complexity that increases with dependency injection. I personally feel that code complexity more correlates with explicit branches, something that I usually eliminate using DI or polymorphism. And yeah, any way of writing code has it's downsides and upsides, just a matter of finding balance.
Yeah absolutely; re: code complexity, if its only one level deep its not really an issue, but you get into nasty situations when you have recursive calls taking place.

A <--- B, C

C <--- D

D <--- E, F

Now if you want to have function G that calls A, it needs to have a call like this:

void G (void E, void F, void D, void B, void C) { A(B, C(D(E, F))) }

(where void --> what ever function pointer)

Not pretty. There are totally ways around this to do with grouping injected values or separating apis so the chain is never deeper than one or two, but it really does lead to some hideous code if you're not careful. :)

As a caveat to what I said as well though, I will admit: what I said only applies if you're on a sane system that lets you use dynamically linked libraries. If you're on a stupid OS that has artifical restrictions (I'm looking at you iOS) this is probably actually not the worst way to go for cleaner code, if you're writing plain C.

Well written! I think that potentially hideous code, putting together groups of injected code is one of the hardest type of component get right. My background is in embedded and mobile so I usually expect a stupid OS or perhaps not even an OS at all (or perhaps writing part of the OS). Artificial restrictions is just silly tough.
The second solution looks like a good old callback to me. The other two are just complicating things with additional state.

In general, I try to avoid function pointers i C unless I am sure it makes the code clearer. It is often difficult to debug code that tries to be clever with them. Like preprocessor macros they are powerful, but tempting to misuse.

Good old callback, exactly what it is! In a very simple case like this perhaps the additional state is complicating things, it does have it's uses is many cases though, as described in the post. I personally think function pointers doesn't make code more difficult to debug than any dependency injection used in many of the higher level programming languages as Java. It is usually the main argument against dependency injection as well. Usually it goes; Since it makes it harder to debug I'd rather make it more coupled. When it's more coupled it's harder to unit test. Since it's harder to unit test, I need to debug more, etc etc. Personally I'd rather break that chain early. In the end, debugging code with function pointers (or dependency injection) gets easier with time.
The article describes an implementation of the observer pattern in C ... It's well known and used in many open source C projects. But I'm not sure that replacing simple tangled dependencies in the Linux kernel by using function pointers is a good idea. Because you may introduce an overhead in some performance critical code. And you may lose the locality of reference when abusing function pointers.
Yes the example used to explain dependency inversion indeed happens to be an observer pattern. Well known yes, at least amongst experienced C programmers, but as mentioned in a comment on the page; "all the more reason that it should be explained in blog posts.".
This article makes me doubt this Arqua tool. Using a callback does not remove coupling. It just makes it dynamic, so static analysis cannot find a direct link.

Arqua would have to find out that what value the global gNotifier variable could contain. An exact points-to-analysis should find out that it is clientNotify an insert the dependency. A less-powerful analysis might approximate that it could be anything. In this case there should be a dependency to every other compilation unit. Since Arqua reports "no dependency" it seems to do a non-conservative approximation, which means it does not report safe results.

That is not really a good quality metric. So why should one optimize against it?

As previously mentioned in one of the other comments; to "facilitate run-time configuration and testing.". Removing the static coupling has a value in itself, making the component isolated to facilitate unit-testing.