| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thinkersilver 4076 days ago
	Kubernetes is going to become the standard api for container orchestration only because there are no other tools out there trying to do as much. There was a vacuum around container orchestration tooling and Google got there first. Kubernetes components can be swapped out for other community-driven efforts, take mesos as an example, which can be used to replace the default k8s scheduler. With k8s you can avoid lock-in with different cloud providers. I think Google is hoping that we end up on their cloud platform but its nice to see that it is being built from the ground up to be used with other cloud platforms.

1 comments

InTheArena 4076 days ago

I'm really not sold on this yet. We've done a number of projects testing and using Kubernetes, CoreOS, Mesos and most recently, Docker's swarm. It's been interesting to see how and why the technology space is evolving, but a couple of general thoughts: 1) The concept of container as primitive, especially the thorough implementation that Docker put together is extraordinarily powerful. 2) The swarm idea - which provides a matching API to the Docker API - is a really near idea, even if it lacks the HA and scheduling functions to really make things work well. 3) I think the next evolution is really to iron out the network stack here. Kubernetes needs flannel in most circumstances, and the process is not seamless or as simple as Docker.

I'd also love to know the split between this, Omega, and Kubernetes at Google.

link

jamesblonde 4076 days ago

I have heard that the Borg is still quite widely used at Google, and Omega hasn't taken over as had previously been expected. Omega is a distributed scheduler, and we can only speculate as to why Omega hasn't taken over. My speculation would be that the optimistic concurrency control in Omega leads to storms at very high loads - attempts to allocate containers that need to be rolled back because of contention. With PCC at high loads, you get progress.

link

nostrademons 4076 days ago

There's also just plain legacy inertia. Many existing systems are on Borg; many of their dependencies are on Borg; most Google engineers are much more familiar with Borg than Omega, and the teammates they might ask for help & advice are also more familiar with Borg than Omega.

Think of how long the Python 2->3 transition has taken (outside Google, not speaking in Google terms anymore). It's been six years, and we're only now reaching the point where Python 3 may be a better choice for green-field projects than Python 2, and Python 3 may never be a better choice for legacy installs. The Borg -> Omega transition has a similar dependency issue (everything runs in the cloud at Google), the learning curve is worse than Python 2->3, and all of Google's code is legacy. That's independent of any technical differences between them, and also irrelevant to whether an organization just getting onto the cloud would be better off with Docker, Mesos, or Kubernetes.

link

jamesblonde 4076 days ago

That's an issue, I guess for many apps. However, Google tend to make company-wide technical decisions, and then the entire engineering crowd go there. How many other companies have one SCM instance? None. If there were unrefutable economic gains to be made by moving to Omega today, my guess they would do it.

The technically interesting question is whether decentralized scheduling in the large scale is a solved problem or not. Can we do it better than centralized today?

link

nostrademons 4076 days ago

Google absolutely does not make company-wide technical decisions and then the entire engineering crowd goes there. Rather, they make company-wide technical decisions, and over a period of 3-5 years the entire engineering crowd gradually gets there. As we used to say: "There are two ways to do everything at Google: the deprecated one and the one that doesn't work yet." In some cases I've seen up to 3 deprecated systems in flight, plus one that doesn't work yet. Borg's predecessor was finally removed from production shortly before I left in 2014, despite being deprecated around 2005.

link

brendandburns 4076 days ago

I'd encourage you to check out:

https://github.com/GoogleCloudPlatform/kubernetes/blob/maste...

It's a fairly straightforward getting started experience.

Also, if you want to turn up a cluster in a cloud provider, it's as simple as https://get.k8s.io

link

mkulke 4076 days ago

Agreed. I am deeply impressed about how "approachable" kubernetes actually is, considering what it does. The overall design concepts are quite simple and the reasoning behind them is clear. It's a small set of self-contained components (api, controller, scheduler, kubelet, proxy sitting on coreos' etcd), so the complexity is fairly manageable. Peeking into the source code of components won't give you the creeps and the build system (cross-compiling) could not be any easier.

I have not yet tried any other docker orchestration framework (there seem to be a few popping up right now), but concerning clustering: In comparison Mesos appears intimidating to me (there is certainly not the 2min "I get this" experience, I've had with tools like etcd & kubernetes) and I remember building clusters w/ technology like heartbeat, corosync, openais & drbd not so long ago - compared to this distributed computing became incredibly easy.

My advise for starters would be to pick some ready2go vagrant-coreos-setup and get it running on your workstation, this should be pretty straightforward. (We are running k8s on openstack/rackspace and there were too many moving parts involved to get the included starter-scripts to reliably bootstrap a kubernetes installation)

Then look at the user-data/cloud-init of that project and try to rebuild things on your preferred stack from the bottom upwards, step after step - I feel a lot more sovereign when doing that. The components' logfiles are actually helpful when you assemble things. It also helps to look at the generated (and documented, thx for this) iptables nat rules, when you have problems with service discovery/communication.

link