Hacker News new | ask | show | jobs
by QuentinM 1445 days ago
Sounds about right. And not the first time it happens either. I recall getting a few of those instant unit 3 panic over the past few years with Ubuntu. Often with things not as common out there in production, like tc (which in our case we were using in production to work around conntrack race conditions), and sometimes we also got non-panicking but absolutely production/nerve wrecking issues like TCP window size calculation overflows after the window went to zero due to a temporary slow consumer - freezing the window size to a few bytes only instead of getting a prompt full window recovery.

Not to mention we’ve also had our fair share of production triple faults from bugs in the Intel firmware patches for Spectre, which took weeks to investigate & fix between ourselves struggling to keep our exchange up & running, Intel, and AWS.

And that is why there’s value in the CoreOS/ContainerLinux-like solutions we designed & implemented nearly a decade ago now. Being able to promptly rollback any kernel/system/package upgrades at once - either manually or either after it’s detected a few panics in quick successions is actually quite awesome. Not to mention the slow update rollout strategy baked into the Omaha controller.

But the reality is that the what-ifs are always the hardest to market, nearly always after-thoughts and with fast-spiking/fast-decaying traction after major events.

1 comments

It really seems like there’s no good non-redhat (but still “production capable”) alternative to CoreOS nowadays, right? It’s pretty much Fedora / Redhat CoreOS or go directly to things such as k3os?
The rancher stack is pretty amazing.

Elemental is pretty close to coreos: https://github.com/rancher/elemental/

They even have a way to build arbitrary os images: https://github.com/rancher/elemental-toolkit

It's pretty great

k3os is in a dieing limbo, now is the time to get some interest in using stuff like it