| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by colohan 2893 days ago

In this thread there is a repeated meme of "Borg is way more scalable than Kubernetes, and will always be so".

But this ignores a lot of the history of Borg. When Borg was first created, it was not nearly as scalable as its current incarnation. We hit scalability bugs and limitations all the time! (I was working on a team which was exploring the scalability limits of MapReduce, which was often very good at finding the limits in Borg and other systems it interacted with.)

Over the years many many Borg engineers have taken on many projects, both in solving bugs and rearchitecting major pieces of Borg with the intention of making it scale better (to run more jobs at once, utilize machines better, increase the degree of failure and performance isolation between jobs, and scale up to manage larger clusters of machines). Many of the lessons learned went into the design of Kubernetes, but Kubernetes is still much newer than Borg, which means it has fewer years of the "identify a scalability bug and squash it" feedback loop.

What is really needed to drive that loop is a major customer pushing the boundaries of scalability and identifying bugs. My guess (from the outside) is that the main users of Kubernetes have been pushing the limits in other directions, which has meant the team has been prioritizing other things (such as improving usability, and adding features) in their development efforts.

1 comments

jsmthrowaway 2893 days ago

Borg will remain orders of magnitude beyond Kubernetes until Kubernetes is completely rearchitected. It’s not scalability bugs. It’s decisions regarding how the cluster maintains state that hamstring it, and that’s so fundamental to everything it’s not a find/squish loop.

As I said in my comment, those major customers (one personal experience, three anecdotally, eight or nine I’ve consulted with) have quietly ruled out Kubernetes, either by trying it or prying it apart and deciding not to try it. That feedback isn’t coming. At Borg scale, Kubernetes is very much considered a nonstarter.

link

davidopp__ 2893 days ago

> Borg will remain orders of magnitude beyond Kubernetes until Kubernetes is completely rearchitected. It’s not scalability bugs. It’s decisions regarding how the cluster maintains state that hamstring it, and that’s so fundamental to everything it’s not a find/squish loop.

Can you say more about this? Borgmaster uses Paxos for replicating checkpoint data, and etcd uses Raft for replicating the equivalent data, but these are really just two flavors of the same algorithm. I don't doubt that there are probably more efficient ways that Kubernetes could handle state (I don't claim to be an expert in that area), but I don't think they're approaches that would look any more like Borg than Kubernetes does.

If you're at liberty to do so, could you say what orchestrators the customers you mentioned chose in lieu of Kubernetes? What scale are they running at for a single cluster?

[Disclaimer: I work on Kubernetes/GKE at Google.]

link

dilyevsky 2891 days ago

> It’s decisions regarding how the cluster maintains state that hamstring it

Jed, you keep repeating this like it's true, but it's not actually so. Here's an excerpt from Borg paper (which David co-authored btw ;-)):

> A single elected master per cell serves both as the Paxos leader and the state mutator, handling all operations that change the cell’s state, such as submitting a job or terminating a task on a machine.

And while we're at it, I don't know what it has to do with FauxMaster since it ran single replica and the passage about C++ is just pure fud.

link

elvinyung 2890 days ago

Just curious, does Borgmaster use Chubby, or is it a completely separate Paxos store?

link

dilyevsky 2890 days ago

It’s using Chubby for locking (it’s actually next sentence in that Borg paper) and some othe things not related to quorum that i cant go into. This is different from kube master that uses etcd for everything but in terms of performance it’s not a big deal because elections dont happen often (and youd be surprised how many ppl run k8s with a single master setup, even GKE)

link