| HN Mirror

Looking at my company's Rancher dashboard, it looks like I'm currently running about 7500 pods. Assuming 1.5 containers/pod (probably high) then I'm not running 1 container, I'm running about 11 thousand containers right now. Please don't assume I can't understand what you're saying because of any particular level of experience. Your points are just as understandable regardless.

I'm not sure there's a real usecase for running multiple versions of the same app at the same time tbh. If the devs have a new version they're tying to push out then first their branch has to pass automated tests before it can be merged to master, (mostly) ensuring old functionality doesn't fail. Then our deployment pipeline deploys it to staging, makes sure everything is healthy and readiness probes are returning 200, then deploys it to prod, makes sure everything comes up, and finally switches the k8s service to point to the new pod versions. If anything breaks at that point, the old pods are still around and I can swap the k8s services to point to the old deploy instantly.

If, for example, two versions of libssl are somehow treating the same protocol version differently, then that'd be detected on staging at the latest. If the devs know they need to upgrade protocol versions from (for example) TLS 1.2 to TLS 1.3, then they'll deploy a version that runs on TLS 1.2 and 1.3, then once everything is working deploy a version that works only on TLS 1.3. Nothing actually takes production traffic until we're fully assured it's healthy. We haven't had a maintenance or upgrade outage for at least 3 years.

Could all this be replicated on a VM platform assuming it has an appropriate API? Definitely. But k8s has all this covered already. How do I switch traffic from the old pods to the new pods? The deployment pipeline runs `kubectl apply -f ingress.yaml` and k8s patches all the load balancer configs to point to the new pods. That's the entirety of what I would need to do if it wasn't already automated.

Certificate management is also pretty easy. Each pod pulls a cert from our PKI (Hashicorp Vault) when it starts up. If the leaf cert expires (unlikely because pods are usually replaced by a new version well before then) then the app throws an exception, the pod goes unhealthy, k8s restarts the pod, the new pod gets a new cert, and it's good for another ~year. This is completely automated by k8s.

Cert management for the k8s nodes themselves actually does involve VMs a bit. Some of our clusters are on AWS EC2 and are set up with autoscaling groups so that if a node has too little usage it'll be downscaled, so if a cert is close to expiring then the node as a whole goes unhealthy, k8s automatically removes all pods from that node and spins up new replicas on other nodes, EC2 detects that load is low and downscales that node, and if spinning up new pods caused the other nodes to have too much utilization then EC2 will spin up new nodes with new certs and everything will be fine for another ~year. Other clusters run on on-prem VMs and we haven't completely automated that yet so those are still manual restarts.

Every few years the root cert will expire and we'll have to restart all the pods or nodes at once. Pods are easy; just redeploy and they'll all get the new cert, or worst case I can run `kubectl delete --all pods` and the PodDisruptionBudgets will ensure that there's a rolling rollout. For nodes, I'll scale up the cluster (increase min replicas in ec2 or add more nodes through the VM platform) so there's a bunch of nodes with the new root cert then drain all the existing nodes which will cause k8s to spin up new app pods on the new uncordoned nodes, then shot down all the old VMs or let ec2 handle it.

You're right that k8s doesn't help with app-level storage issues like concurrent access, nor does it help with storage-level issues like backups and replication. I should've been more specific that k8s helps with how the apps connect to storage. While migrating VM-deployed apps to containers I've found a few ways they've done it: config files specifying connection strings, hardcoded strings in code, pulling values from secrets management, requiring that the VM have the fileshare mounted already, etc. In k8s there's one way to do it: the app's manifest includes a PV and a PVC. Ops handles how k8s connects to the storage from there. This isn't really a k8s advantage; you could tell all your devs to use some internal library that abstracts storage and let ops write or maintain that library too. But that really only works with one company at a time, while when we onboard an acquisition that uses k8s they've already got PVs set up so we just have to migrate those. My point in saying that k8s abstracts connecting to storage was mostly about how it's an industry standard interface specifically for connecting to storage, which helps eliminate having to figure out how each individual app connects. If security makes a firewall rule that blocks all your VMs from hitting storage then for VM-deployed apps I've got to look "ok did the devs change this config file? Did someone forget to mount the fileshare or did an update break that? Is it some third option I've never seen before?" while for our k8s-deployed apps I've got one place to start looking using kubectl.

Another point I didn't address is that yes this does require specific app architectures. The pods have to be stateless, databases are not in k8s and certainly not running alongside the app itself in the same container or pod, concurrent file access is not generally my problem, and security's wacky firewall rules can be fun to implement when I can't say what IP a particular app has. But I think the tradeoffs are generally worth it.

You're right I'm not the most experienced at large scale infrastructure problems outside of k8s. I've managed or helped manage a couple of small server racks and a single 6-rack datacenter before, and I work closely with the non-k8s infrastructure team at my current company, but I'm not the one deciding what we're going to do to get off of VMWare for example. What I can say though is that between my past experience and the companies we've acquired, there's a lot more variation and lack of best practices among the companies that don't use k8s compared to the ones that do. With the non-k8s companies I have to familiarize myself with the idiosyncratic way each they handle every aspect of their infrastructure; with the k8s companies I already know at least half of their infrastructure.