Hacker News new | ask | show | jobs
by msiemens 696 days ago
I know it's this is a rather long tangent and not the main point of the article, but regarding "Docker Swarm over Kubernetes", I've had a ton of bad experiences at my employer running a production Swarm cluster. Among them:

- Docker Swarm and Docker Compose use different parsers for `docker-compose.yaml` files, which may lead to the same file working with Compose but not with Swarm ([1]).

- A Docker network only supports up to 128 joined containers (at least when using Swarm). This is due to the default address space for a Docker network using a /24 network (which the documentation only mentions in passing). But, Docker Swarm may not always show error message indicating that it's a network problem. Sometimes services would just stay in "New" state forever without any indictation what's wrong (see e.g. [2]).

- When looking a up a service name, Docker Swarm will use the IP from the first network (sorted lexically) where the service name exists. In a multi-tenant setup, where a lot of services are connected to an ingress network (i.e. Taefik), this may lead to a service connecting to a container from a different network than expected. The only solution is to always append the network name to the service name (e.g. service.customer-network; see [3]).

- Due to some reason I still wasn't able to figure out, the cluster will sometimes just break. The leader loses its connection to the other manager nodes, which in turn do NOT elect a new leader. The only solution is to force-recreate the whole cluster and then redeploy all workloads (see [4]).

Sure, our use case is somewhat special (running a cluster used by a lot of tenants), and we were able to find workarounds (some more dirty than others) to most of our issues with Docker Swarm. But what annoys me is that for almost all of the issues we had, there was a GitHub ticket that didn't get any official response for years. And in many cases, the reporters just give up waiting and migrate to K8s out of despair or frustration. Just a few quotes from the linked issues:

> We, too, started out with Docker Swarm and quickly saw all our production clusters crashing every few days because of this bug. […] This was well over two years (!) ago. This was when I made the hard decision to migrate to K3s. We never looked back.

> We recently entirely gave up on Docker Swarm. Our new cluster runs on Kubernetes, and we've written scripts and templates for ourselves to reduce the network-stack management complexities to a manageable level for us. […] In our opinion, Docker Swarm is not a production-ready containerization environment and never will be. […] Years of waiting and hoping have proved fruitless, and we finally had to go to something reliable (albeit harder to deal with).

> IMO, Docker Swarm is just not ready for prime-time as an enterprise-grade cluster/container approach. The fact that it is possible to trivially (through no apparent fault of your own) have your management cluster suddenly go brainless is an outrage. And "fixing" the problem by recreating your management cluster is NOT a FIX! It's a forced recreation of your entire enterprise almost from scratch. This should never need to happen. But if you run Docker Swarm long enough, it WILL happen to you. And you WILL plunge into a Hell the scope of which is precisely defined by the size and scope of your containerization empire. In our case, this was half a night in Hell. […] This event was the last straw for us. Moving to Kubernetes. Good luck to you hardy souls staying on Docker Swarm!

Sorry, if this seems like like Docker Swarm bashing. K8s has it's own issues, for sure! But at least there is a big community to turn to for help, if things to sideways.

[1]: https://github.com/docker/cli/issues/2527 [2]: https://github.com/moby/moby/issues/37338 [3]: https://github.com/docker/compose/issues/8561#issuecomment-1... [4]: https://github.com/moby/moby/issues/34384