Hacker News new | ask | show | jobs
by fareesh 1340 days ago
I am currently in the middle of setting up AWS and that decision graph made me chuckle because it resonates quite deeply.

I need to do GPU inference but I don't want to run the machine 24x7. I may use it for about 4 hours per day at best. Lambda doesn't offer GPUs and neither does ECS+Fargate.

It seems like I could setup an endpoint using Sagemaker and then destroy it when no longer needed, and automate all of this but it feels quite messy.

The other route is perhaps I can launch an instance every day with ECS and then get rid of it.

All these routes seem quite inefficient. There seems to be something called Elastic inference where I can provision the right amount of GPU resources - but it seems like I'll need a spare EC2 instance to do that if I'm not mistaken, which is not ideal either.

I guess all this stems from the fact that there is no straightforward virtualization for GPU workloads and so they have to provision them 1:1 which currently they are not equipped to do.

Has anyone run into a similar problem and found a more elegant solution? All of the above are very messy. Is there some obvious choice I am missing?

2 comments

Depends on what "4 hours per day" really means. If you want an interactive endpoint, putting in the work to set up an ECS task you can start and stop feels like the best approach. If you have longer-running inference tasks and just want to pick up results asynchronously, Batch (which is a layer on top of ECS) seems like the way to go.

SageMaker might have an abstraction which is a closer fit for your particular use-case, but I'd be wary of potential cost excesses; running on raw EC2 and automating the lifetime somehow is inevitably going to be the cheapest route.

The task takes about 1-4 minutes. In an ideal world, it can start immediately and then shutdown, but more realistically it will have to spin up or batch things up at a convenient point in time. Sagemaker seems more expensive that doing it via ECS.
If your GPU inference can run on an Intel integrated GPU, you could rent a dedicated server from OVH for ~$130 a month and use the integrated GPU on that. I don't know about affordable dedicated servers from mature providers with Nvidia GPUs.
I pay a tiny fraction of that for a dedicated served from Hetzner with a modern iGPU transcoding and streaming video 24/7.