| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lobster_johnson 3645 days ago

One thing I would say is that — because of the aforementioned documentation mess — it seems more daunting than it actually is. And the documentation does make it seem like a lot of work.

All you need to do, in broad strokes, is:

* Set up a VPC. Defaults work.

* Create an AWS instance. Make sure it has a dedicated IAM role that has a policy like this [1], so that it can do things like create ELBs.

* Install Kubernetes from binary packages. I've been using Kismatic's Debian/Ubuntu packages [2], which are nice.

* Install Docker >= 1.9 < 1.10 (apparently).

* Install etcd.

* Make sure your AWS instance has a sane MTU ("sudo ifconfig eth0 mtu 1500"). AWS uses jumbo frames by default [3], which I found does not work with Docker Hub (even though it's also on AWS).

* Edit /etc/default/docker to disable its iptables magic and use the Kubernetes bridge, which Kubelet will eventually create for you on startup:

   DOCKER_OPTS="--iptables=false --ip-masq=false --bridge=cbr0"

* Decide which CIDR ranges to use for pods and services. You can carve a /24 from your VPC subnet for each. They have to be non-overlapping ranges.

* Edit the /etc/default/kube* configs to set DAEMON_ARGS in each. Read the help page for each daemon to see what flags they take. Most have sane defaults or are ignorable, but you'll need some specific ones [4].

* Start etcd, Docker and all the Kubernetes daemons.

* Verify it's working with something like: kubectl run test --image=dockercloud/hello-world

Unless I'm forgetting something, that's basically it for one master node. For multiple nodes, you'll have to run Kubelet on each. You can run as many masters (kube-apiserver) as you want, and they'll use etcd leases to ensure that only one is active.

[1] https://gist.github.com/atombender/3f9ba857590ea98d18163e983...

[2] http://repos.kismatic.com/debian/

[3] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_m...

[4] https://gist.github.com/atombender/e72c2acc2d30b0965543273a2...

4 comments

chrissnell 3645 days ago

You're making things really hard on yourself. Boot your nodes with CoreOS and it provides almost everything you need (except Kubernetes itself) out-of-the-box. It all works really well together and you get automatic updates, too. I can't imagine trying to run the cluster we run on Ubuntu, trying to roll my own Docker/etc/flannel installs.

lobster_johnson 3645 days ago

I'm sure CoreOS is nice, but we're currently on Ubuntu, and I'm trying to reduce the number of unknown factors and new technologies that we're bringing into the mix. Ubuntu is not the issue here. (FWIW, you don't need Flannel on AWS.)

piva00 3643 days ago

Can you expand a bit on why you don't need flannel on AWS? We're currently deploying a k8s cluster and I surely went the flannel route (following the steps of CoreOS guide to k8s) but it'd be nice to remove that setup from our deployment if possible.

lobster_johnson 3643 days ago

AWS has VPCs, allowing you to get a practically unlimited number of private subnets.

In some cloud environments (e.g. DigitalOcean), there's no private subnet shared between hosts, so Kubernetes can't just hand out unique IPs to pods and services. So you need something like Flannel, which can set up a VPC either with UDP encapsulation or VxLAN.

Flannel also has a backend for AWS, but all it does is update the routing table for your VPC. Which can be useful, but can also be accomplished without Flannel. It's also limited to about 50 nodes [1] and only one subnet, as far as I know. I don't see the point of using it myself.

[1] https://github.com/coreos/flannel/issues/164

bboreham 3642 days ago

Could you say how you arrange that the addresses you pick for your pods do not clash with the addresses AWS picks for instances?

lobster_johnson 3642 days ago

Kubernetes does this for you IPs. For example, if your VPC subnet is 172.16.0.0/16, then you can tell K8s to use 10.0.0.0/16.

AWS won't know this IP range and won't route it. So K8s automatically populates your routing table with the routes every time a node changes or is added/removed.

K8s will give a /24 CIDR to each minion host, so the first will get 10.0.1.0/24, the next 10.0.2.0/24, and so on. Each pod will get 10.0.1.1, 10.0.1.2, etc.

Obviously having an additional IP/interface per box adds complexity, but I don't know if K8s supports any other automatic mode of operation on AWS.

(Note: Kubernetes expects AWS objects that it can control — security groups, instances, etc. — to be tagged with KubernetesCluster=<your cluster name>. This also applies to the routing table.)

lobster_johnson 3643 days ago

Forgot to say: Kubernetes will keep the routing table up to date if you use --allocate-node-cidrs=true. That way, it does exactly the same thing as Flannel with the "aws-vpc" backend.

Rapzid 3645 days ago

Awesome, I'll take a look into all that! This doesn't look too bad. Do you know if you can combine the masters/minions? Our environments are VPC isolated, and we support ad-hoc creation so I'd like to keep server count requirements to a bare minimum.. The current from-scratch guide says it is not necessary to make the distinction between master nodes and normal nodes; and the api, controller, etc appear to be hosted as pods. This makes me happy and makes sense, but then you have something like this which has me confused: https://github.com/kubernetes/kubernetes/issues/23174 .

On a side note, it's pretty awesome how Docker embedded the key-value store into the main binary. Appears to reduce complexity quite a bit.

lobster_johnson 3645 days ago

You can run them on the same box just fine. There's nothing magical about any of those processes.

However, using dedicated masters (by which I mean mostly kube-apiserver) separate from worker nodes is a good idea to avoid high load impacting the API access.

(Just keep in mind that the Kismatic packages I referred to won't support this — you can't install kubernetes-master and kubernetes-node at the same time. But as you discovered, you can run everything except kubelet as pods. On the other hand, kube-apiserver needs a whole bunch of mounts as well as host networking, so to me it seems like you don't gain all that much.)

What is this Docker key-value store you mention?

Rapzid 3645 days ago

https://blog.docker.com/2016/06/docker-1-12-built-in-orchest...

They are using a Raft based store inside the engine now so there is no external etcd dependency. IIRC they are using etcd's raft implementation.

lobster_johnson 3645 days ago

Interesting, thanks. Personally, Docker is already too monolithic, and this just looks like it adds unnecessary coupling to something should be less coupled in the first place. I'd prefer to use etcd.

I think rkt is making some good decisions and is worth keeping an eye on. Not sure I love the tight coupling to systemd, but the fact that it avoids the PID 0 problem and lets containers be their own processes (separate from the "engine", which can choreograph containers through the systemd API, building on all of its process handling logic) are improvements over Docker. In fact, rkt uses the same networking model as Kubernetes.

lobster_johnson 3643 days ago

Replying to myself: On Ubuntu Xenial you have to start Docker with this additional flag:

    --exec-opt native.cgroupdriver=cgroupfs

Since Xenial uses systemd, there's no longer an /etc/default/docker. Instead, create /etc/systemd/system/docker.service.d/docker.conf with:

      [Service]
      ExecStart=
      ExecStart=/usr/bin/docker daemon --exec-opt native.cgroupdriver=cgroupfs --iptables=false --bridge=cbr0 --ip-masq=false

TheIronYuppie 3641 days ago

Quick question - if you're using AWS (or GCP or Azure), was there a reason that:

  ./kube-up.sh

Didn't work for you?

Disclosure: I work at Google on Kubernetes

lobster_johnson 3641 days ago

Did you read my earlier comment (https://news.ycombinator.com/item?id=12024148)?

In short, I want and need to understand how it's put together so that I can use it.

There was someone on the #kubernetes-novices slack today [1] who rightly pointed out who described his approach as: Run kube-up, then try to deconstruct everything that kube-up did into a repeatable recipe. I went the other route, by trying to understand what kube-up did and replicating it. I'm still working through things I missed or did wrong.

To be honest, I think Google's approach here is wrong. Kubernetes is being developed at a frenetic pace, but documentation is not being maintained (it's pretty lacking even if you're on GCP!), and users are understandably frustrated with the obscurity of the whole thing. It works, but it takes weeks to gather enough of an understanding of the system, and that's entirely due to lack of documentation.

The documentaton is lacking both a high and low level. At no point does the documentation offer a big-picture view of how everything works together, nor does it offer low-level descriptions of the stack.

I also think the strong focus on kube-up is a mistake, given the lack of docs. I'm sure it works great, but it's not an option for production use, in my opinion. Terraform would have been better here. You're also using Salt — honestly, it would have been so much cooler if kube-up could just take a few inputs ("what cloud?", "what are your credentials?" etc.) and generate a finished Salt config for you, with a separate salt-cloud orchestration config for the provisioning. The current Salt config is a bit of a mess, and not really something you can build on.

Feel free to reach out to me (@atombender) on the Kubernetes Slack if you want to chat.

[1] https://kubernetes.slack.com/archives/kubernetes-novice/p146...

TheIronYuppie 3641 days ago

Great feedback! Both Kubernetes Anywhere (https://github.com/kubernetes/kubernetes-anywhere) and our documentation efforts (https://github.com/kubernetes/kubernetes.github.io) are very much in flight - and they're both coming by 1.4 (~90 days).