Hacker News new | ask | show | jobs
by spost 2135 days ago
I work for a startup whose product is small (half a dozen servers, if relatively beefy ones) clusters that will be run on-prem by customers, at least sometimes in a low-to-no-touch capacity. Most of our application components are micro-ish services that are run on all hosts in the cluster for either extra capacity or fault tolerance.

We currently run everything on mesos/marathon, but are looking to switch away from it. K8s is kinda the “default” option, and is potentially appealing to some potential acquirers and investors.

But I never really see k8s being talked about in that context of “physical hardware that’s on prem, but not on MY prem.” Is there a reason for that? If we go with k8s is it going to bite us? Does anyone have experience with something like that they could share?

8 comments

> But I never really see k8s being talked about in that context of “physical hardware that’s on prem, but not on MY prem.” Is there a reason for that? If we go with k8s is it going to bite us? Does anyone have experience with something like that they could share?

Kubernetes provides a leaky abstraction above the underlying hardware - the storage and networking are going to be different depending on who is maintaining the Kubernetes cluster. Kubernetes's strength is that it acknowledges the leakiness of the abstraction and makes it explicit. If your customer uses a specific networking and storage provider, Kubernetes makes it easier for you to say (or not) that you have certified your product for those networking and storage providers, and here's what the manifests look like, because there's a standard way of configuring the application to work with that networking (CNI, which powers the standard Service as well as maybe NetworkPolicies) and storage (CSI, specifically StorageClass) provider.

If you just provide Docker images, or VM appliances, then Murphy promises you that you're going to get frustrated support calls from customers saying "your application is slow and we don't understand why." Good luck then.

I have some experience with this. Way I see it, if you have your software on somebody else's on-prem and move to k8s, more importantly than replacing the stack between Linux and your app is a change in mentality and demarcation of responsibilities, as in now your app "runs on k8s" and your clients are responsible for that layer (or they can contract that out) and you are responsible only for your app.

It helps abstract out everything in the stack below your app and easier conceptually on everybody; now the clients can train in k8s and use same set of tooling like prometheus/grafana etc that usually go with k8s, same or similar RBAC access etc.

OTOH realistic expectations need to be set; not because it's k8s there's going to be no problems or the learning/adaptation won't come without some pain. I suggest writing down some standard procedures for your clients like upgrades etc, pick same set of tooling for all of them (like same dashboard, same logging/monitoring/alerting etc) as a way to homeganize ("standardize") all of them.

Feel free to email me.

There's a great talk from Chick-Fil-A about running kube clusters on bare metal [0]. I'm also going to be taking up a similar problem soon-ish and I'm also looking into Container Linux/derivations for doing a lot of the bare metal, updating, and rollbacks they talk about here. If anyone here has worked on this project, or similar, it would be awesome to get in touch!

[0] - https://www.youtube.com/watch?v=8edDcy3oeUo

One reason for example can be providing better reliability to the customers who were using certain application on premises and need/want to continue doing so. I've observed how our solution switched from monolithic blob of cpp code which crashed everything upon major failures; to the several modules - now your monitoring may crash and restart but hopefully service won't be interrupted, but when performance part crashed - all service stopped, and maintenance time was long; to the k8s - when all parts are split into separate containers and performance part is split into smaller chunks completely redundant, so when one crashes it is a) restarted without affecting 99% of other users, b) it is restarted much quicker, ten times quicker, meaning less down time to 1% affected users.

But k8s introduced non trivial amount of complexity and its own bugs and maintenance cost, meaning new separate engineers to maintain and develop just k8s tooling. And we had to rewrite a lot of legacy code. But the trade off is much better for a big project. Cutting downtimes by an order of magnitude and being able to boast it to the board - apparently priceless :) .

Really provisioning a VM manually is equivalent to provisioning a physical server. Just commented on this above.

Try using eg. Ubuntu and some kind of centralised management tool, like Salt, and install k8s. For better control, use Flux for storing your k8s configuration (deployments, configs, etc) in Git. I believe it would be good for your sanity.

Else your k8s objects will be susceptible to someone doing a klutz and "whoops" your applications are gone, real gone...

I know companies that use these folks to help manage “on other folks prem” K8s deployments: https://platform9.com/
I did an on-prem k8s deployment at my last place. It is definitely challenging compared to EKS and GKE, but the difficultly is not in base k8s.

Following the kubeadm getting started guide on the kubernetes.io site can get you an 'ha', 'production ready' going in a couple hours. Most of it is pretty mechanical, and only needs a couple key decisions, mainly your networking plugin. Generally the most popular ones have instructions as part of the getting started guide, making the process straight forward.

Where it quickly becomes difficult is after this step. You have a cluster ready to serve workloads, but it has no storage, no ingress/external load balancer.

Storage can be as simple as NFS volumes (you don't even need a provider for this, but you should use one anyway). Rook/Ceph will work, but now you've just taken on two complex technologies instead of one.

Without an external load balancer of some sort, you will have trouble getting traffic into your cluster, and it likely won't be actually HA. You can use MetalLB for this, or appliances. If you're just starting out though, you can totally get away with setting up CNAME aliases in DNS to your nodes in a round robin type fashion. It won't be HA, but it will work, and is simple and straight forward.

Ingress is pretty easy to setup for the most part. Usually just applying an available manifest with a tweak or two. If you go the CNAME route, you will need an ingress setup so you can serve http/https on standard ports without too many issues.

If you do all these things, then you have a real deal cluster. Things like ingresses are recommended even if you're running in the cloud, so you may find that you're not all that far off from what you might find there.

Overall, the biggest trouble is all the choices you need to make. If you're starting out, maybe read up on two or three of the most popular choices for each step, and then just pick one. Anything that exists entirely within the cluster can usually be expressed purely as source controlled manifests, and kubeadm deployments can be simple shell scripts if you don't make them do everything (i.e. only support one container driver, not all of them).

One major caveat; If you screw up your network layer, you basically have to start over. This isn't strictly true, but it's the one where you are often better off starting over when you need to make fundamental changes to your network setup (like podCIDR and serviceCIDR or your network plugin). Pretty much everything else can be made to work with multiple setups at once, or you just need to delete and redeploy that component.

Kube-vip is another good alternative to metallb.
You maybe interested in metal3.io And yes it will “bite” you but so will everything else.