Hacker News new | ask | show | jobs
by tomxor 1990 days ago
Yes, but this is the entire principle around which microkernels are designed: making the the last critical piece of code as small and reliable as possible. Minix3's kernel is <4000 lines of C.

As far as bitflips are concerned, having the critical kernel code occupy fewer bits reduces the probability of a bitflip causing an irrecoverable error.

1 comments

Yes, I understand this -- basic risk mitigation by reducing the size of your vulnerability.

(I'll archaic brag a bit by mentioning I used to be a heavy user of Minix - my floppy images came in over an X25 network - and saw Andy Tanenbaum give his Minix 3 keynote at FOSDEM about a decade ago. I'm a big fan.)

Anyway, while reducing risk this way is laudable, and will improve your fleet's health, as per TFA it's a poor substitute, with bad economics and worse politics behind it, than simply stumping up for ECC.

I'll also note that, for example, Google's sitting on ~3 million servers so that ~4k LoC just blew out to 12,000,000,000 LoC -- and that's for the hypervisors only.

Multiply that out by ~50 to include VM's microkernels, and the amount of memory you've now got that is highly susceptible to undetected bit-flips is well into the mind-blowing range.

Oh i'm not saying it's the single best solution, I guess I got carried away in argument - It's simply a scenario where the concept shines, yet it's entirely artificial scenario and I agree ECC is the correct way.