| > Leaving aside the fact that I don’t believe anyone does this. Everyone who uses GPUs with Kubernetes does exactly this. GPUs are not a native thing to Kubernetes. > You just maybe proved the point that I originally asserted: the things you configure on kubernetes are the same things you configure on cloud VMs You are of course entirely missing the point, and I’m not sure if you’re doing it on purpose or not. You have 100 units of work that you need to run. A unit of work is some “thing” that needs a certain number of CPU cores, memory, GPUs and other user-defined resources. Each unit of work also needs an individual identity, distinct from other units of work. Go and code something to run that workload on the minimum number of cloud VMs, taking into account cost and your own user-defined scaling policies, minimizing the amount of unused resources. Now make it handle adapting to changes in the quantity and definitions of those units of work. Now make it handle over-committing, allowing units of work to have hard and soft limits that depend on the utilization of the underlying hardware. Now make it provision some form of secure identity per unit of work. After you’ve spent time coding that, you’ll realize that: 1. It’s hard 2. You’ve re-invented part of Kubernetes 3. Your implementation is shit 4. It’s very much not “the same things you can configure on cloud VMs” |
1. Kubernetes manifests require "requests" to be specified. (mem/CPU allocation)
2. Getting 100 VMs identical is not difficult on the cloud.
The point I'm making is that you've already abstracted a lot of the things away with Cloud, and we abstract the same exact things even more on top of kubernetes.
If K8S was running on bare metal I'd agree with you though.