Hacker News new | ask | show | jobs
by thockingoog 3645 days ago
It's funny how words can be played. The Kubernetes "master" is a set of 1 or more machines that run the API server and associated control logic. This is exactly what systems like Docker swarm do, but they wrap it in terms like RAFT and gossip that make people weak in the knees. Kubernetes has RAFT in the form of the storage API (etcd). This is a model that has PROVEN to work well, and to scale well beyond what almost anyone will need.

"Federation" in this context is across clusters, which is not something other systems really do much of, yet. You certainly don't want to gossip on this layer.

"evaluating replacing" really does imply "kicking the tires". Put another way - how much energy are you willing to invest in the early stages of your evaluation? If a "real" cluster took 3 person-days to set up, but a "quick" cluster took 10 person-minutes, would you use the quick one for the initial eval? Feedback we have gotten repeatedly was "it is too hard to set up a real cluster when I don't even know if I want it".

There are a bunch of facets of streamlining that we're working on right now, but they are all serving the purposes of reducing initial investment and increasing transparency.

> how easy it is to get the new Docker orchestration running

This is exactly my point above. You don't think that their demos give you a fully operational, secured, optimized cluster with best-perf networking, storage, load-balancing etc, do you? Of course not. It sets up the "kick the tires" cluster.

As for AWS - it is something we will keep working on. We know our docs here are not great. We sure could use help tidying them up and making them better. We just BURIED is things to do.

Thanks for the feedback, truly.

1 comments

I would consider "kicking the tires" actually running up a cluster and playing with it. One can also evaluate by reading documentation and others reports of issues to look for show-stopping problems. For instance, a couple releases ago there was not multi-AZ support. The word on the street at that time was to create multiple clusters and do higher level orchestration across them.. That was a no-go for us; no need to "kick the tires".

Whatever you may think of my level of knowledge or weak knees for consensus and gossip protocols, these problems(perceived or otherwise) with setup, documentation, and management seem pretty widely reported.

EDIT: I hope this doesn't sound too negative. Kubernetes IS getting better all the time. I only write this to give a perspective from somebody who would like use Kubernetes but has reason for pause. Our requirements are likely not standard; our internal bar for automation and ease of use is quite high. We essentially have an internal, hand-rolled, docker-based PaaS with support for ad-hoc environment creation(not just staging/prod). We would like to move away from holding the bag on our hand-rolled stuff and adopt a scheduler :) Deciding to pull the trigger on any scheduler though would be committing us to a rather large amount of integration effort to reach a parity that doesn't seem riddled with regressions over the current solution.

This frustrates me greatly, only because I agree with you so vehemently :-) We have an open issue tracker where real decisions are made, and so engineers will argue about different approaches. Compare to alternatives, where you see demos that are double-acts of good-cop vs good-cop where apparently there are no trade-offs and everything is perfect. It isn't my experience that products where the debates are hidden are better; it is certainly easier to see the compromises when the debates are public.

So: there was a big discussion about whether a single k8s cluster should span multiple AZs (which shipped in 1.2), or whether we should allow the API to target multiple independent clusters (federation, the first version of which is shipping in 1.3). The core of the argument is that multi-zone is simpler for most users, but with only one control plane it is less reliable than a federation of totally independent clusters. Federation also brings other benefits, like solving the problem of running in clusters that are not in a single "datacenter" i.e. where you need to worry about non-uniform latency. I haven't seen anyone else make a serious attempt at solving this.

So, remember that the issue tracker is filled with the unvarnished discussions that come from true open source development. I think it is an asset for you, because you don't discover those things 3 months into using your chosen product; but it is definitely a liability for k8s, because we rely on you realizing this in your initial evaluation and weighting appropriately (the devil you know vs the devil you don't). I think k8s is likely much better than you think it is, and you should come talk to us on slack and make sure of that fact! It certainly sounds like you have an interesting use case that we'd like to hear about and consider.

But yes, our docs should be better!

You might take a look at Rancher - it integrates and fully automates Kubernetes deployment, but personally, I find their Cattle scheduler is much easier to reason about, supports multi-AZ out of the box, and supports all of the features you would want (DNS-based service discovery, encrypted overlay networking, etc.)

Regarding the multi-AZ support issue - this is mostly because an EBS volume can only be attached to EC2 instances in the same AZ, and since Kubernetes has great support for persistent data volumes, you're pretty much limited to a single AZ if you're using persistent data volumes and want them to be remounted on a different instance in case of a failure. I think a more viable solution for persistent data volumes is to leverage EFS and use Convoy NFS to mount them. Now you have highly available, scalable, persistent data volumes, and you can stretch your cluster across multiple AZs.

In this case, what you would do is set up two separate clusters, and spread an ELB across them. No federation required :)

Disclosure: I work at Google on Kubernetes

But, if you have persistent EBS volumes, you wouldn't be able to mount them on the other cluster if you had a failure of an entire AZ.