Hacker News new | ask | show | jobs
by ranger207 699 days ago
Docker's good at packaging, and Kubernetes is good at providing a single API to do all the infra stuff like scheduling, storage, and networking. I think that if someone sat down and tried to create a idealized VM management solution that covered everything between "dev pushes changes" to "user requests website" then it'd probably have a single image for each VM to run (like Docker has a single image for each container to run) then management of VM hosts, storage, networking, and scheduling VMs to run on which host would wind up looking a lot like k8s. You could certainly do that with VMs but for various path dependency reasons people do that with containers instead and nobody's got a well adopted system for doing the same with VMs
1 comments

I'm sorry, but:

* Docker isn't good at packaging. When people talk about packaging, they usually understand it to include dependency management. For Docker to be good at packaging it should be able to create dependency graphs and allow users to re-create those graphs on their systems. Docker has no way of doing anything close to that. Aside from that, Docker suffers from the lack of reproducible builds, lack of upgrade protocols... It's not good at packaging... maybe it's better than something else, but there's a lot of room for improvement.

* Kubernetes doesn't provide a single API to do all the infra stuff. In fact, it provides so little, it's a mystery why anyone would think that. All those stuff like "storage", "scheduling", "networking" that you mentioned comes as add-ons (eg. CSI, CNI) which aren't developed by Kubernetes, aren't following any particular rules, have their own interfaces... Not only that, Kubernetes' integration with CSI / CNI is very lacking. For example, there's no protocol for upgrading these add-ons when upgrading Kubernetes. There's no generic interface that these add-ons have to expose to the user in order to implement common things. It's really anarchy what's going on there...

There are lots of existing VM management solutions, eg. OpenStack, VSphere -- you don't need to imagine them, they exist. They differ from Kubernetes in many ways. Very superficially, yet importantly, they don't have an easy way to automate them. For very simple tasks Kubernetes offers a very simple solution for automation. I.e. write some short YAML file. Automating eg. ESX comes down to using a library like govmomi (or something that wraps it, like Terraform). But, in the mentioned case, Terraform only managed deployment, and doesn't take care of the post-deployment maintenance... and so on.

However, the more you deal with the infra, the more you realize that the initial effort is an insignificant fraction of the overall complexity of the task you need to deal with. And that's where the management advantages of Kubernetes start to seem less appealing. I.e. you realize that you will have to write code to manage your solution, and there will be a lot of it... and a bunch of YAML files won't cut it.

Docker's dependency management solution is "include everything you need and specify a standard interface for the things you can't include like networking." There's no concern about "does the server I'm deploying to have the right version of libssl" because you just include the version you need. At most, you have to have "does the server I'm deploying to have the right version of Docker/other container runtime for the features my container uses" which are a much smaller subset of changes. Reproducible builds, yeah, but that's traditionally more a factor of making sure your build scripts are reproducible than the package management itself. Or to put it another way, dockerfiles are just as reproducible as .debs or .rpms. Upgrading is replacing the container with a new one

Kubernetes is an abstraction layer that (mostly) hides the complexity of storage networking etc. Yeah the CNIs and CSIs are complex, but for the appdev it's reduced to "write a manifest for a PV and a PVC" or "write a manifest for a service and/or ingress". In my company ops has standardized that so you add a key to your values.yaml and it'll template the rest for you. Ops has to deal with setting up that stuff in the first place, which you have to do regardless, but it's better than every appdev setting up their own way to do things

My company's a conglomerate of several acquisitions. I'm from a company that was heavy into k8s, and now I'm working on getting everyone else's apps that are currently just deployed to a VM into a container and onto a k8s cluster instead. I might shouldn't've said k8s was an API per se, but it is a standardized interface that covers >90% of what people want to do. It's much easier to debug everything when it's all running on top of k8s using the same k8s concepts than it is debugging why each idiosyncratic VM isn't working. Could you force every app to use the same set of features on VMs? Want a load balancer, just add a value to your config and the deployment process will add your VM to the F5? Yeah, it's possible, but we'd have to build it, or find a solution offered by a particular vendor. k8s already has that written and everyone uses it

This is super, super, super naive. You, essentially, just solved for the case of one. But now you need to solve for N.

Do you seriously believe you will never be in a situation where you have to run... two containers?.. With two different images? If my experience is anything to go by, even postcard Web sites often use 3-5 containers. I just finished deploying a test of our managed Kubernetes (technically, it uses containerd, but it could be using Docker). And it has ~60 containers. And this is just the management part. I.e. no user programs are running there. It's a bunch of "operators", CNIs, CSIs etc.

In other words: if your deployment was so easy that it could all fit into a single container -- you didn't have a dependency problem in the first place. But once you get realistic size deployment, you now have all the same problems. If libssl doesn't implement the same version of TLS protocol in two containers -- you are going to have a bad time. But now you also amplified this problem because you need certificates in all containers! Oh and what a fun it is to manage certificates in containers!

> Kubernetes is an abstraction layer that (mostly) hides the complexity of storage networking etc

Now, be honest. You didn't really use it, did you? The complexity in eg. storage may manifest in many different ways. None of them have anything to do with Kubernetes. Here are some examples: how can multiple users access the same files concurrently? How can the same files be stored (replicated) in multiple places concurrently? What about first and second together? Should replication happen at the level of block device or filesystem? Should snapshots be incremental or full? Should user ownership be encoded into storage, or should there be an extra translation layer? Should storage allow discards when dealing with encryption? And many, many more.

Kubernetes doesn't help you with these problems. It cannot. It's not designed to. You have all the difficult storage problems whether you have Kubernetes or not. What Kubernetes offers is a possibility for the storage vendors to expose their storage product through it. Which is nothing new. All those storage products can be exposed through some other means as well.

In practice, some storage vendors who choose to expose their products through Kubernetes usually end up with a limited subset of the storage functionality exposed in such a way. So, not only storage through Kubernetes doesn't solve your problems: it adds more of them. Now you may have to work around the restrictions of Kubernetes if you want to use some unavailable features (think, for example all the Ceph CLI that you are missing when using Ceph volumes in Kubernetes: it's hundreds of commands that are suddenly unavailable to you).

----

You seem like an enthusiastic person. And you probably truly believe what you write about this stuff. But you went way above your head. You aren't really an infra developer. You kind of don't even really recognize the general patterns and problems of this field. And that's OK. You don't have to be / do that. You just happened to be a new car owner who learned how to change oil on your own, and you are trying to preach to a seasoned mechanic about the benefits and downsides of different engine designs :) Don't take it to heart. It's one of those moments where maybe years later you'll suddenly recall this conversation and feel a spike of embarrassment. Everyone has that.

Looking at my company's Rancher dashboard, it looks like I'm currently running about 7500 pods. Assuming 1.5 containers/pod (probably high) then I'm not running 1 container, I'm running about 11 thousand containers right now. Please don't assume I can't understand what you're saying because of any particular level of experience. Your points are just as understandable regardless.

I'm not sure there's a real usecase for running multiple versions of the same app at the same time tbh. If the devs have a new version they're tying to push out then first their branch has to pass automated tests before it can be merged to master, (mostly) ensuring old functionality doesn't fail. Then our deployment pipeline deploys it to staging, makes sure everything is healthy and readiness probes are returning 200, then deploys it to prod, makes sure everything comes up, and finally switches the k8s service to point to the new pod versions. If anything breaks at that point, the old pods are still around and I can swap the k8s services to point to the old deploy instantly.

If, for example, two versions of libssl are somehow treating the same protocol version differently, then that'd be detected on staging at the latest. If the devs know they need to upgrade protocol versions from (for example) TLS 1.2 to TLS 1.3, then they'll deploy a version that runs on TLS 1.2 and 1.3, then once everything is working deploy a version that works only on TLS 1.3. Nothing actually takes production traffic until we're fully assured it's healthy. We haven't had a maintenance or upgrade outage for at least 3 years.

Could all this be replicated on a VM platform assuming it has an appropriate API? Definitely. But k8s has all this covered already. How do I switch traffic from the old pods to the new pods? The deployment pipeline runs `kubectl apply -f ingress.yaml` and k8s patches all the load balancer configs to point to the new pods. That's the entirety of what I would need to do if it wasn't already automated.

Certificate management is also pretty easy. Each pod pulls a cert from our PKI (Hashicorp Vault) when it starts up. If the leaf cert expires (unlikely because pods are usually replaced by a new version well before then) then the app throws an exception, the pod goes unhealthy, k8s restarts the pod, the new pod gets a new cert, and it's good for another ~year. This is completely automated by k8s.

Cert management for the k8s nodes themselves actually does involve VMs a bit. Some of our clusters are on AWS EC2 and are set up with autoscaling groups so that if a node has too little usage it'll be downscaled, so if a cert is close to expiring then the node as a whole goes unhealthy, k8s automatically removes all pods from that node and spins up new replicas on other nodes, EC2 detects that load is low and downscales that node, and if spinning up new pods caused the other nodes to have too much utilization then EC2 will spin up new nodes with new certs and everything will be fine for another ~year. Other clusters run on on-prem VMs and we haven't completely automated that yet so those are still manual restarts.

Every few years the root cert will expire and we'll have to restart all the pods or nodes at once. Pods are easy; just redeploy and they'll all get the new cert, or worst case I can run `kubectl delete --all pods` and the PodDisruptionBudgets will ensure that there's a rolling rollout. For nodes, I'll scale up the cluster (increase min replicas in ec2 or add more nodes through the VM platform) so there's a bunch of nodes with the new root cert then drain all the existing nodes which will cause k8s to spin up new app pods on the new uncordoned nodes, then shot down all the old VMs or let ec2 handle it.

You're right that k8s doesn't help with app-level storage issues like concurrent access, nor does it help with storage-level issues like backups and replication. I should've been more specific that k8s helps with how the apps connect to storage. While migrating VM-deployed apps to containers I've found a few ways they've done it: config files specifying connection strings, hardcoded strings in code, pulling values from secrets management, requiring that the VM have the fileshare mounted already, etc. In k8s there's one way to do it: the app's manifest includes a PV and a PVC. Ops handles how k8s connects to the storage from there. This isn't really a k8s advantage; you could tell all your devs to use some internal library that abstracts storage and let ops write or maintain that library too. But that really only works with one company at a time, while when we onboard an acquisition that uses k8s they've already got PVs set up so we just have to migrate those. My point in saying that k8s abstracts connecting to storage was mostly about how it's an industry standard interface specifically for connecting to storage, which helps eliminate having to figure out how each individual app connects. If security makes a firewall rule that blocks all your VMs from hitting storage then for VM-deployed apps I've got to look "ok did the devs change this config file? Did someone forget to mount the fileshare or did an update break that? Is it some third option I've never seen before?" while for our k8s-deployed apps I've got one place to start looking using kubectl.

Another point I didn't address is that yes this does require specific app architectures. The pods have to be stateless, databases are not in k8s and certainly not running alongside the app itself in the same container or pod, concurrent file access is not generally my problem, and security's wacky firewall rules can be fun to implement when I can't say what IP a particular app has. But I think the tradeoffs are generally worth it.

You're right I'm not the most experienced at large scale infrastructure problems outside of k8s. I've managed or helped manage a couple of small server racks and a single 6-rack datacenter before, and I work closely with the non-k8s infrastructure team at my current company, but I'm not the one deciding what we're going to do to get off of VMWare for example. What I can say though is that between my past experience and the companies we've acquired, there's a lot more variation and lack of best practices among the companies that don't use k8s compared to the ones that do. With the non-k8s companies I have to familiarize myself with the idiosyncratic way each they handle every aspect of their infrastructure; with the k8s companies I already know at least half of their infrastructure.