Hacker News new | ask | show | jobs
by elvinyung 2894 days ago
Dumb question: why does K8s use a centralized architecture like Borg, if the perf gains from an Omega-style shared-state scheduler decentralization (and maybe a Mesos-style two-level scheduler for batch with multiple frameworks) were already known, and Omega was already being folded back into Borg?

Is this related to (I'm assuming) the fact that K8s was originally architected "mostly" with service rather than batch in mind, and a monolithic scheduler was "good enough"?'

(Disclaimer: I haven't really followed K8s stuff in the last few months. How is multi-scheduler support for K8s nowadays, anyways?)

1 comments

You can actually build an Omega vertical / Mesos framework architecture on Kubernetes, as described in this doc[1]. That doc pre-dated CRDs; the way you'd do it today is to build the application lifecycle management part of the framework using a CRD + controller, and run an application-specific scheduler (for pods created by that controller) alongside the default scheduler. The Kubernetes documentation page explaining how to run multiple/custom schedulers is here[2].

Borg only worked with a single scheduler, but Kubernetes allows you to build Omega/Mesos style verticals/frameworks and associated scheduling as user extensions to the control plane (as described above).

[1] https://github.com/kubernetes/community/blob/master/contribu...

[2] https://kubernetes.io/docs/tasks/administer-cluster/configur...

[Disclaimer: I work on Kubernetes/GKE at Google.]

> Borg only worked with a single scheduler

No love for rescheduler? =(

The rescheduler in Borg isn't a scheduler -- it just evicts pods, and then they go into the regular scheduler's pending queue and the regular scheduler decides where to schedule them. (At least that's how it worked at the time I left the project -- I assume it hasn't changed in this regard, but I don't know for sure.)

Because the name is confusing, we called the Kubernetes version of the Borg rescheduler the "descheduler" (https://github.com/kubernetes-incubator/descheduler) to make it clear that it doesn't actually schedule, just evicts. (There actually is something in Kubernetes called the "rescheduler" (https://kubernetes.io/docs/tasks/administer-cluster/guarante...) but it's a long story and we never should have named it that).