Hacker News new | ask | show | jobs
by crb 3403 days ago
> Why doesn't Kubernetes destroy these nodes when they've been out of commission for 3-4 hours?

Kubernetes isn't responsible for the lifecycle of its nodes. It can run in a DC where "destroying a node" might mean paging a tech to turn off a server. Something external - in your case, kops & your ASG - is responsible for the nodes that Kubernetes runs on. That's a deliberate design choice.

It should make a correct decision not to schedule work there, which it sounds like it did.

Given that, your other questions are hard to answer. kubelet is a process that runs on the nodes. So is docker. If you can't get into the machine to diagnose the fault, I'd encourage you to set up some monitoring/log shipping off the node so you can see what the state was when it failed.

There's nothing inherently "Kubernetes" about this diagnosis - it's more EC2, node/kernel/OS and Docker troubleshooting, in that order.

1 comments

Correct, Kubernetes is not responsible for the nodes. I would build a health check into your Autoscale Group (I don't know exactly how to do this on AWS, but am happy to show you an example on GCP - aronchick (at) google).

If you can't get to the machine, there are a million reasons why this would be the case - but ssh is a totally separate process, it's way outside of Kubernetes. VERY commonly, you've run out of memory and processes are fighting among themselves (especially since EVERYTHING seems to be failing), but this is total speculation. OS issues are common too - I've spun up clusters switching from one distro to another, same config, and everything worked great.

Disclosure: I work at Google on Kubernetes.

Speaking of distros and considering your background. What would be the "best" Distro for running Kubernetes ?
If all the OS does is provide a minimal surface for running containers, I'd focus on whatever gives me the best security, manageability and updates.

The Container Optimised OS is what GKE uses on Google Cloud Platform https://cloud.google.com/container-optimized-os/docs/

It's conceptually very similar to CoreOS' Container Linux, so I might try that if I were looking at Kubernetes elsewhere and wanted a container-only OS.

If I am running an environment with multiple purposes - some container hosts, some regular machines - I'd err on the side of "who is my current vendor/what does my ops team support and know best".

Great thanks for the valuable infos. We are running SLES12 and also a Suse Openstack Cloud on bare metal and only recently Suse has announced their container strategy (SLE MicroOS Distro) but we haven't had time to evaluate it yet. At a recent DevConf I saw some interesting talks about immutable container hosts such Fedora Atomic. Seems that there is a lot of work done in this area.