Hacker News new | ask | show | jobs
by daxfohl 2962 days ago
But there's this from a fairly influential former googler: https://medium.com/@steve.yegge/honestly-i-cant-stand-k8s-48...

It's a quick read, but to summarise: it's almost as complex as google's internal borg system but the benefit isn't even close (partially because nobody else has google problems). I can't down K8s personally, as I've never used it myself. But I wonder if there's the possibility of a system that's more 80/20 of google's borg rather than the seeming 20/80 coming from k8s. And I wonder if Grab will release it next year.

5 comments

I don't think that's true. If you believe the team that designed Kubernetes, Kubernetes is an attempt to improve on Borg.

Borg is an accumulation of a decade's work with containers at Google, and has been described by googlers as a rich but a little messy, having been designed incrementally over many years as needs have surfaced. Borg could never be open-sourced because it's so specific to Google; for example, it uses Google's own cgroups-based container tech, not Docker/OCI/etc. Omega, as I understand, was an effort to clean up Borg and modernizing it, but apparently it was never put into production; instead, some of the innovations ended up being backported to Borg [1].

More importantly, Kubernetes is based roughly on the same design as Borg: A declarative, consistent object store, with controllers, schedulers and other bits and pieces orchestrating changes to the store, mediated by a node-local controller (Borglet/Kubelet). A major difference between Borg and Kubernetes is that with Borg, the object store is exposed to clients, whereas Kubernetes hides it behind an API. Another difference is the structure of containers; Borg's "allocs" are coarser-grained than pods and Borg is less strict about where things go, which googlers have described as a shortcoming compared to Kubernetes' strict pod/container structure. Another difference, also seen as a shortcoming, is that Borg lacks Kubernetes' one-IP-per-pod system; all apps on Borg apparently share the host's network interface. Kubernetes also innovates on Borg in several ways; for example, Borg doesn't have labels [2].

Borg, from what I gather, scales much further than Kubernetes at this point, but it's really not related to the design. The design is fundamentally the same.

Yegge's criticisms are too handwavy ("overcomplicated") to counter, but I don't think Yegge knows what he's talking about here. As for "benefit": Not sure what you mean by this, but Kubernetes arguably comes with benefits — declarative ops, platform abstractions, container isolation — even if you're just running a single node. The notion that you only need Kubernetes if you have "Google-scale problems" is just nonsense.

PS. What's "Grab"?

[1] https://ai.google/research/pubs/pub44843 (I recommend reading this paper)

[2] https://kubernetes.io/blog/2015/04/borg-predecessor-to-kuber...

All of the things you described as improvements are more complexity and layers of indirection. Kubernetes may be an attempt to improve on Borg but adding on a bunch of features and plugin architectures to solve more use cases isn't necessarily an improvement.
Is your assertion that k8s suffers from "second system syndrome" when compared to Borg?
Kubernetes is actually the third system. In-between, there's Omega. Yegge must have never set up a GSLB or GFE service to appreciate the extra stuff that Kubernetes features.
Thanks for the detailed insight. I've been following k8s for a while now but never had a need, and Yegge's post put an end to it in my mind.

Grab is the company Yegge left google for. He always complained about google's inability to platformize, so random hunch is he instills this desire into Grab? But entirely random. I also don't know how influential he was inside google vs outside.

Kubernetes does suffer from complexity right now, but that also gives it it's flexibility.

I think turnkey kubernetes solutions like Rancher will dominate for a lot of use cases, especially for individual devs and small teams that can't have a dedicated DevOps resource to manage kubernetes.

As someone who has been running kubernetes clusters for the last 1.5-2 years. It's quickly becoming boring (in a good way). I'm at a fairly large enterprise and we're running five clusters (two gke managed, which .. is limiting, three with our own bootstrap). I'd say we spend about 10%-20% of our time actively managing clusters (upgrades, troubleshooting.. some of that time is about maintaining some value adds like automatic log shipping.. that kind of thing), most of our work is writing value-adds, helping customers onboard and consulting with teams on how to best deploy and work with kubernetes (many teams at my company aren't really familiar with containerization).

Granted we run our own bootstrap (when we started none of the bootstraps running around we're ready, we started with a terraform/make implementation of hightower's hardway and we've just kept adding. We're thinking about revisiting the space again.

I didn't know he was still writing.
- K8s is not a clone of Borg
Every tech companies with thousands of employees has google problems.
I highly doubt it. Even most of Google doesn't have "google problems". (Having to design things for "google scale" when you have no reason to was a popular gripe while I was there.)

You don't go Google scale because it's cool to. You either do it because you absolutely have to, or you don't because, thankfully, you don't have to.

It sure is trendy to think so though!

I'm currently supporting ops of some "eventually consistent, globally distributed" (if you can fathom through the previous engineer's algorithms) built-from-scratch system designed to be resilient in the face of multi-megaX-transactions-per-second, that currently contains all of 8K records.

Whereas stackoverflow.com runs on SQL. Their own hardware at that.

I half wonder if a large chunk of AWS's revenue could be replaced by https://www.amazon.co.uk/electrical-sockets-adhesive-sticker...

Companies don't go google scale, companies are google scale. It's not a choice, it's a fact they have to deal with.

Pick any large tech company, they have thousands of servers and countless customized software running. That's what is meant by google scale.

They need to manage that and they suck at it most of the time, they don't know what resources they own and they can't figure out what's running or where.