| I built Jollyturns (https://jollyturns.com) which has a fairly large server-side component. Before starting the work on Jollyturns I worked at Google (I left in 2010) on a variety of infrastructure projects, where I got to use first hand Borg and all the other services running on it. When I started more than 5 years ago few of the options mentioned in the article were available, so I just used Xen running on bare metal. I just wanted to get things done as opposed to forever experimenting with infrastructure. Not to mention the load was inexistent, so everything was easy to manage. After I launched the mobile app, I decided to spend some time on the infrastructure. 2 years ago I experimented with Marathon, which was originally developed at Twitter, probably by a bunch of former Google employees. The reason is Marathon felt very much like Borg: you could see the jobs you launched in a pretty nice Web interface, including their log files, resource utilization and so on. Deploying it on bare metal machines however was an exercise in frustration since Marathon relied heavily on Apache Mesos. For a former Googler, Mesos had some weird terminology and was truly difficult to understand what the heck was going on. Running Marathon on top of it had major challenges: when long-running services failed you could not figure out why things were not working. So after about 2 weeks I gave up on it, and went back to manually managed Xen VMs. Around the same time I spent few days with Docker Swarm. If you're familiar with Borg/Kubernetes, Swarm is very different from it since - at least when it started, you had to allocate services to machines by hand. I wrote it off quickly since allocating dockerized services on physical machines was not my idea of cluster management. Last year I switched to Kubernetes (version 1.2) since it's the closest to what I expect from a cluster management system. The version I've been using in production has a lot of issues: high availability (HA) for its components is almost non-existent. I had to setup Kube in such a way that its components have some resemblance of HA. In the default configuration, Kubernetes installs its control components on a single machine. If that machine fails or is rebooted your entire cluster disappears. Even with all these issues, Kubernetes solves a lot of problems for you. The flannel networking infrastructure greatly simplifies the deployment of docker containers, since you don't need to worry about routing traffic between your containers. Even now Kubernetes doesn't do HA: https://github.com/kubernetes/kubernetes/issues/26852 https://github.com/kubernetes/kubernetes/issues/18174 Don't be fooled by the title of the bug report, the same component used in kubectl to implement HA could be used inside Kube' servers for the same reason. I guess these days the Google engineers working on Kubernetes have no real experience deploying large services on Borg inside Google. Such a pity! |