|
|
|
|
|
by yunong
3807 days ago
|
|
> You don't debug a unikernel, mostly; you just kill and replace it Curious as to how you would drive to root cause the bugs that caused the crash in the first place? If you don't root cause, won't subsequent versions still retain the same bugs? There are bugs that can only manifest themselves in production. Any system where we don't have the ability to debug and reproduce these classes of problems in prod is essentially a non-starter for folks looking to operate reliable software. |
|
Instead, unikernels seem to me to instead be a good way to harden high-quality, battle-tested server software. Of two main kinds:
• Long-running persistent-state servers that have their own management capabilities. For example, hardening Postgres (or Redis, or memcached) into a unikernel would be a great idea. A database server already cleans and maintains itself internally, and already offers "administration" facilities that completely moot OS-level interaction with the underlying hardware. (In fact, Postgres often requires you to avoid doing OS-level maintenance, because Postgres needs to manage changes on the box itself to be sure its own ACID guarantees are maintained. If the software has its own dedicated ops role—like a DBA—it's likely a perfect fit for unikernel-ification.)
• Entirely stateless network servers. Nginx would make a great unikernel, as would Tor, as would Bind (in non-authoritative mode), as would OpenVPN. This kind of software can be harshly killed+restarted whenever it gets wedged; errors are almost never correlated/clustered; and error rates are low enough (below the statistical noise-floor where the problem may as well have been a cosmic ray flipping bits of memory) that you can just dust your hands of the responsibility for the few bugs that do occur (unless you're operating at Facebook/Google scale.) This is the same sort of software that currently works great containerized.
---
† The best solution for your custom app-tier, if you really want to go the unikernels-for-everything route, might be a hybrid approach. If you run your app in both unikernel-image, and OS-process-on-a-Linux-VM-image setups, you'll get automatic instrumentation "for free" from the Linux-process instances, and production-level performance+security from the unikernel instances. You could effectively think of the Linux-process images as being your "debug build."
Two ways of deploying such hybrid releases come to mind:
• Create cluster-pools consisting of nodes from both builds and load-balance between them. Any problems that happen should eventually happen on an instrumented (OS-process) node. This still requires you to maintain Linux VMs in production, though.
• Create a beta program for your service. Run the Linux-process images on the "beta production" environment, and the unikernel-images on the "stable production" environment. Beta users (a self-selected group) will be interacting with your new code and hopefully making it fall down in a place where it's still instrumented; paying customers will be working with a black-box system that—hopefully—most of the faults will have been worked out of. Weird things that happen in stable can be investigated (for a stateful app) by mirroring the state to the beta environment, or (for a stateless app) by accessing the data that causes the weird behavior through the beta frontend.