Hacker News new | ask | show | jobs
by mnahkies 1279 days ago
It's not quite lambda, but GKE auto pilot supports GPU workloads, so it could be a relatively easy way to do this.

You could have a rest service sticking incoming requests into a queue, and then a processor deployment picking off the queue using the GPU resource requests / spot instances. You'd probably also want something to be scaling the processor deployment replicas based on the queue depth and your budget.

I haven't compared the pricing to EKS so unsure if it would really be better financially, but it would avoid having to manage scaling up/down GPU nodes explicitly.

https://cloud.google.com/kubernetes-engine/docs/how-to/autop...