Hacker News new | ask | show | jobs
by sergiolp 3989 days ago
Some years ago, I've spent a lot of time studying GNU Mach and Hurd (I've also made some small contributions). I think I can say that I now both pretty well. I even started a project to preserve OSF Mach + MkLinux source code (https://github.com/slp/mkunity), a very cool project for its time (circa 1998).

These days I prefer to do my kernel hacking on monolithic kernels, mainly NetBSD. I've stopped working on Mach, Hurd and other experimental microkernels (there're a bunch out there) because it was becoming increasingly frustrating.

If you'd ask me to define the problem with microkernels with one word, that would be "complexity". And its a kind of complexity that impacts everything:

- Debugging is hard: On monolithic kernels, you have a single image, with both code and state. Hunting a bug is just a matter of jumping into the internal debugger (or attaching an external one, or generating a dump, or...) and looking around. On Hurd, the state is spread among Mach and the servers, so you'll have to look at each one trying to follow the trail left by the bug.

- Managing resources is hard: Mach knows everything about the machine, but nothing the user. The server knows everything about the user, but nothing about the machine. And keeping them in sync is too much expensive. Go figure.

- Obtaining a reasonalbe performance is har... imposible: You want to read() a pair of bytes from disk? Good, prepare a message, call to Mach, yield a little while the server is scheduled, copy the message, unmarshall it, process the request, prepare another message to Mach to read from disk, call to Mach, yield waiting for rescheduling, obtain the data, prepare the answer, call to Mach, yield waiting for rescheduling, obtain your 2 bytes. Easy!

In the end, Torvalds was right. The user doesn't want to work with the OS, he wants to work with his application. This means the OS should be as invisble as possible, and fulfill userland requests following the shortest path. Microkernels doesn't comply with this requirement, so from a user's perspective, they fail natural selection.

That said, if you're into kernels, microkernels are different and fun! Don't miss the oportunity of doing some hacking with one of them. Just don't be a fool like me, and avoid become obsessed trying to achieve the imposible.

2 comments

It's like the argument about excessive modularity in software design in general: you can split a system into so many little pieces that each one of them becomes very (deceptively) simple, but in doing so you've also introduced a significant amount of extra complexity in the communication between those pieces.

Personally, I think modularity is good up to the extent that it reduces complexity by removing duplication, but beyond that it's an unnecessary abstraction that obfuscates more than simplifies.

The communication would've happened anyway. Now it just happens through a common mechanism with strong isolation. That all the most resilient systems, especially in safety-critical space, are microkernels speaks for itself. For instance, MINIX 3 is already quite robust for a system that's had hardly any work at all on it. Windows and UNIX systems each took around a decade to approach that. Just the driver isolation by itself goes a long way.

Now, I'd prefer an architecture where we can use regular programming languages and function calls. A number of past and present hardware architectures are designed to protect things such as pointers or control flow. Those in production are not, but have MMU's & at least two rings. So, apps on them will both get breached due to inherently broken architecture and can be isolated through microkernel architecture with interface protections, too. So, it's really a kludgey solution to a problem caused by stupid hardware.

Still hasn't been a single monolithic system to match their reliability, security, and maintenance without clustering, though.

>For instance, MINIX 3 is already quite robust for a system that's had hardly any work at all on it. Windows and UNIX systems each took around a decade to approach that. Just the driver isolation by itself goes a long way.

MINIX3 also has hardly any work done WITH it, so I don't think we can compare it to Windows and UNIX systems regarding robustness, unless we submit it to the same wide range of scenarios, use cases and work loads...

I'd like to see a battery of tests to see where it's truly at. Yet, there's still not a MINIX Hater's Handbook or something similar. That's more than UNIX's beginnings can say. ;)
Communication would've happened, but probably between far less actors. So, you have a communication channel which is orders of magnitude slower, and bigger communication needs. Not good.

That said, about the reliability point, I agree with you. If you're building an specialized system, and reliability is your main concern, microkernels+multiservers are the way to go (or, perhaps, virtualization with hardware extensions, but this is a pretty new technology for some industries).

Probably you're going to need to add orthogonal persistence to the mix, to be able to properly recover from a server failure, or an alternative way to sync states, which will also have an impact on performance. But again, you're gaining reliability in exchange.

The communication channel does get slower. The good news is that applications are often I/O bound: lots of comms can happen between such activity if designed for that. One trick used in the 90's was to modify a processor to greatly reduce both context switching and message passing overhead. A similar thing could be done today.

Of course, if one can modify a CPU, I'd modify it to eliminate the need for message-passing microkernels. :)

I think this is a good example of the law of conservation of complexity[1]. You can't reduce complexity, you can only change what's complex. In the case of monolithic kernels versus microkernels, it sounds like going to a microkernel moves the complexity from the overall design into the nuts and bolts of interprocess communication.

[1] https://en.wikipedia.org/wiki/Law_of_conservation_of_complex...

You've just hit the nail right on the head.
If you'd ask me to define the problem with microkernels with one word, that would be "complexity".

The problem with Mach, you mean. All the examples you listed are specific to it.

I don't know about other implementations, but I remember the original design of l4hurd (based on L4Ka), was even more complex. I'd same this applies to all "pure" multiserver designs.
Check out Genode.org, MINIX 3, or QNX. Seem to have gotten a lot more done than Hurd despite being microkernel-based OS's. KeyKOS is one of the best from back in the day with EROS being a nice x86 variant of it. Turaya Desktop is based on Perseus Framework.

Many working systems in production from timesharing to embedded to desktop that are microkernel-based. Hurd and Mach's problems are most likely due to design choices that created problems.

I don't know about the others, but at least both QNX and Minix3 cheated a little, i.e. allowing servers to write directly to other user space programs.

Also, the presence of microkernel+multiserver systems is still quite symbolic in comparison with the monolithic couterparts.