Hacker News new | ask | show | jobs
by matt_wulfeck 3727 days ago
> While we heavily utilise Helios for container-based continuous integration and deployment (CI/CD) each machine typically has a single role – i.e. most machines run a single instance of a microservice.

It's strange to me that this is still so common. My theory is that the "one machine one port" philosophy is still built into a lot of software (monitoring, the ELB, etc). Another is that this is the philosophy we've always known.

Take a look at Kubernetes. Everything is accessible via localhost:<some port>. that breaks most home-built and enterprise orchestration and monitoring tools spectacularly even though it's a much simpler mode (everything is a port, not ip port combo).

Density is much easier to accomplish on larger machines with more cores, which are elastic in the face of bursty residents. They are also generally cheaper per compute/memory.

3 comments

All of those things are doing gymnastics with ports because nobody can be bothered to ship IPv6. If you can bring v6 up you can assign every process an IP and start assuming ports (80 is the service via HTTP, 443 via TLS, 8080 via HTTP/2 gRPC, 9000 for monitoring, and so on). It's way cleaner than all the work around ports in the current state of the art and means you can Just Use DNS in a number of scenarios. There are whole systems around ports in pretty much every orchestration system and it's such an antipattern, really. Half of Docker's networking stack, a bunch of Kubernetes logic, Flannel, all of it becomes unnecessary and they represent attempts to jam the right way into limited IP and limited address table space on infrastructure.

IPv6 is practically built for containers, and, to Kubernetes's credit, they architected with that in mind. (Learned from BNS.) Weirdly, what I'm saying here was the original idea behind ports in the first place. There just aren't enough of them, particularly when half your space is shared with client sockets.

I want a world where v4 is pretty much just my control plane into the v6 cluster, since I'll die before IPv4. Google and far more importantly Amazon need to come up with a v6 story in their cloud offerings already. AWS has had a decade. This isn't just blind advocacy any more; the orchestration and software side is starting to build entire parts of the OSI stack because the network side of our industry is stuck without any sign of moving, no matter how dire the v4 situation.

It's strange (or perhaps rather unfortunate) to me as an SRE at Spotify as well. Helios is in many ways similar to Kube, so it was our hope that eventually it would lead us to scheduling multiple service containers per physical machine. We certainly have the service discovery framework to support that model.

However given Spotify's business position our priority has yet to shift from providing engineers compute capacity as fast as possible to optimising our usage of said compute capacity. It's all now somewhat of a moot point as we move away from our own hardware into Google's cloud.

But less predictable in terms of overall performance due to all the shared components.

It's also harder to separate two or more processes that 'grew up' together in the same container/machine/vm.