Hacker News new | ask | show | jobs
by ethagnawl 640 days ago
I have not read too deeply into this but, do any of these serverless environments offer GPUs? I'm sure there are ... reasons but the lack of GPU support in Lambda and Fargate remains a major paint point for AWS users.

It's been keeping me wrangling EC2 instances for ML teams but I do wonder how much longer that will last.

5 comments

The major clouds don't support serverless GPU because the architecture is fundamentally different from running CPU workloads. For Lambda specifically, there's no way of running multiple customer workloads on a single GPU with Firecracker.

A more general issue is that the workloads that tend to run on GPU are much bigger than a standard Lambda-sized workload (think a 20Gi image with a smorgasbord of ML libraries). I've spent time working around this problem and wrote a bit about it here: https://www.beam.cloud/blog/serverless-platform-guide

> there's no way of running multiple customer workloads on a single GPU with Firecracker.

You can do this with SR-IOV enabled hardware.

https://docs.nvidia.com/networking/display/mlnxofedv581011/s...

The only big one I know of is Cloud Run on GCP.

https://cloud.google.com/run/docs/configuring/services/gpu

This sounds very compelling. Thanks!
I know for sure this has been on AWS's road map for multiple years now. RE:invent is near. Let's see if they can ship..
The big guys are lagging a bit, but there are many smaller parties offering serverless GPU.

I've been a quite satisfied customer of Runpod's serverless GPU offering, running a side project that uses computer vision to detect toxic clouds in webcam feeds of an industrial site.

If you want generative AI, try Replicate, as they have offer a more specialized product.

They use GPUs under the hood for inference/fine-tuning and charge by token. Fireworks will even let you deploy a Lora serverless at the same pricing as base model.

But not aware of any “lambda”-like serverless for any old CUDA workload. Given loading times, it wouldn’t really make sense. Something like CloudRun or KNative for GPUs would be cool.