| Yeah exactly... This seems closer to an HPC problem, not a "cloud" problem. Related comment from 6 months ago about Kubernetes use cases: https://lobste.rs/s/kx1jj4/what_has_your_experience_with_kub... Summary: scale has at least 2 different meanings. Scaling in resources doesn't really mean you need Kubernetes. Scaling in terms of workload diversity is a better use case for it. Kubernetes is basically a knockoff of Borg, but Borg is designed (or evolved) to run diverse services (search, maps, gmail, etc.; batch and low latency). Ironically most people who run their own Kube clusters don't seem to have much workload diversity. On the other hand, HPC is usually about scaling in terms of resources: running a few huge jobs on many nodes. A single job will occupy an entire node (and thousands of nodes), which is what's happening here. I've never used these HPC systems but it looks like they are starting to run on the cloud. Kubernetes may still have been a defensible choice for other reasons, but as someone who used Borg for a long time, it's weird what it's turned into. Sort of like protobufs now have a weird "reflection service". Huh? https://aws.amazon.com/blogs/publicsector/tag/htcondor/ https://aws.amazon.com/marketplace/pp/Center-for-High-Throug... |
We migrated to k8s to A) have a way to standardize how to run containerized builds and get the benefits for "it works on my laptop" matching how it works in production (at least functionally) and B) a common set of patterns for managing deployed software.
Resource scheduling only became of interest after we migrated when we realized the aggregation of our payloads allowed us to use things like spot instances without jeopardizing availability.