I don't understand how the first clause in this sentence connects to the second.
With a simple, predictable workload --- what they have --- it can make sense to lean towards static scheduling, rather than dynamic schedulers. K8s and Nomad are both dynamic schedulers.
This is pretty basic stuff; it's super weird how urgently people seem to want to dunk on them for not using K8s. It comes across as people not understanding that there are other kinds of schedulers; that "scheduling" means what Borg did.
We did! And it did work. And there are def some great things that I (we) love about k8s. Personally, the declarative aspect of it was chef's kiss. "I want 2 of these and 3 of these, please", and it just happens.
Which is the primary reason why we did investigate k8s on-prem. We had already done the work to k8s-ify the apps, let's not throw that away. But running k8s on-prem is different than running your own k8s in the cloud is different than running on managed k8s in the cloud.
Providing all of the bits k8s needs to really work was going to really stretch our team, but we figured with the right support from a vendor, we could make it work. We worked up a spike of harvester + rancher + longhorn and had something that we could use as if it were a cloud. It was pretty slick.
Then we got the pricing on support for all of that, and decided to spend that half million elsewhere.
We own our hardware, we rent cabs and pay for power & network. We've got a pretty simple pxeboot setup to provision hardware with a bare OS that we can use with chef to provide the common bits needed.
It's not 'ultimately flexible in every way', but it's 'flexible enough to meet the needs of our workloads'.
What is your position at 37Signals and how do you like it? I'm really impressed by the innovation that comes out of you guys and the workplace culture you folks have.
Bare vanilla k8s or k3s is nice but it doesn't do much outside of your homelab. Once you want k8s on production in the cloud you have to start about thinking of:
- loadbalancing and ingress controller
- storage
- network
- iam and roles
- security groups
- centralized logging
- registry management
- vulnerability scanning
- ci/cd
- gitops
And all this is no less complex with k8s than with nomad, bare docker or whatever they chose. And definitely no less complex because it is on a major cloud provider.
Hey Melingo, I noticed that you responded to a lot of different threads in this post. It seems like you are a bit dismissive of people's experiences using K8s. I have also run K8s at scale, and it is not easy, it is not out of the box in cloud providers. There are a ton of addons, knobs, and work that has to be doen to build a sustainable and "production ready" version of K8s (for my requirements) in AWS.
K8s is NOT easy, and I do not believe that in it's current form it is the pinnacle of deployment/orchestration technologies. I am waiting for what is next, because the pain that I have personally experienced around K8s that I know others are feeling as well does not make it a perfect solution for everything, and definitely not usable for others.
At the end of the day it's a tool, and it is sometimes difficult to work with.
The main issues we faced with over 700VMs were: outdated os, full disks, full inodes, broken hardware, missing backups or missing backup strategy, oom.
K8s health itself, fixes out of memory by restarting a pod, solves storage by shipping logs out and killing a pod in case it still runs full, has a rollout startegy, health checks and readiness probes.
It provides easy deployment mechanism out of the box, adding a domain is easy, certificates get renewed centrally and automatically.
Scaling is just a replica number and you have node Autoupgrade features build in.
K8s provides what people build manually out of the box, certified, open sourced and battle tested.
I'm on GKE. The hosts and control plane are managed for me. All I need to do is build/test/security scan images and then promote/deploy the image (via Helm) when it goes out to prod.
Using config management and introducing config drift and management of the underlying operating system is a lot more to think about, and a lot more that can go wrong.
With a simple, predictable workload --- what they have --- it can make sense to lean towards static scheduling, rather than dynamic schedulers. K8s and Nomad are both dynamic schedulers.
This is pretty basic stuff; it's super weird how urgently people seem to want to dunk on them for not using K8s. It comes across as people not understanding that there are other kinds of schedulers; that "scheduling" means what Borg did.