| I am currently in the middle of setting up AWS and that decision graph made me chuckle because it resonates quite deeply. I need to do GPU inference but I don't want to run the machine 24x7. I may use it for about 4 hours per day at best. Lambda doesn't offer GPUs and neither does ECS+Fargate. It seems like I could setup an endpoint using Sagemaker and then destroy it when no longer needed, and automate all of this but it feels quite messy. The other route is perhaps I can launch an instance every day with ECS and then get rid of it. All these routes seem quite inefficient. There seems to be something called Elastic inference where I can provision the right amount of GPU resources - but it seems like I'll need a spare EC2 instance to do that if I'm not mistaken, which is not ideal either. I guess all this stems from the fact that there is no straightforward virtualization for GPU workloads and so they have to provision them 1:1 which currently they are not equipped to do. Has anyone run into a similar problem and found a more elegant solution? All of the above are very messy. Is there some obvious choice I am missing? |
SageMaker might have an abstraction which is a closer fit for your particular use-case, but I'd be wary of potential cost excesses; running on raw EC2 and automating the lifetime somehow is inevitably going to be the cheapest route.