Hacker News new | ask | show | jobs
by throwaway7865 1478 days ago
We’ve moved a small-scale business to Kubernetes and it made our lives much easier.

Anywhere I’ve worked business always prioritizes high availability and close to zero downtime. No one sees a random delivered feature. But if a node fails at night - everybody knows it. Clients first of all.

We’ve achieved it all almost out of the box with EKS. Setup with Fargate nodes was literally a one-liner of eksctl.

Multiple environments are separated with namespaces. Leader elections between replicas are also easy. Lens is a very simple to use k8s IDE.

If you know what you’re doing with Kubernetes (don’t use EC2 for nodes, they fail randomly), it’s a breeze.

1 comments

We don't have an issue with that last point, lots of EC2 EKS nodes and they don't fail randomly. Were you using resources and limits correctly? EKS nodes can fall over randomly if you don't reserve resources on the nodes for system processes, and your workloads eat up all the resources. That's probably not well documented either.
EC2 instances are inherently unreliable and that's not a knock on them, that's exactly the contract that you get using them and you're supposed to plan your architecture around the fact that at any moment an EC2 instance could die. We lose about 2-3 EC2 nodes per day (not like our app stops, like Amazon's own instance health goes red) and we couldn't care less.
What percentage of EC2 nodes is that?
Empirically around 0.1%
Setting limits is important, but it always has been. Kubernetes nodes typically don't have a swap so without setting container limits, some critical process can OOM. With swap enabled, memory grows > pathological swapping ensures => caches get dropped making disk performance suck, and all the while your system is shuffling pages between memory and disk. So of course load hits 50+ and the machine turns into a 'black hole'. I've even seen a single VM do that, and cause so much disk IO that it took out the whole hypervisor (which had a single RAID volume)