| HN Mirror

In general, Cortex will be significantly cheaper because you're only paying AWS for EC2 (the bulk of the bill) and the other AWS services used (a much smaller portion of the bill). With SageMaker, you're paying the EC2 bill plus a ~40% premium.

To keep the AWS bill as low as possible, Cortex supports inference on spot instances, which are unused instances that AWS sells at a steep (as in 90%) discount. The drawback is that AWS can reclaim the instance when needed, but with ML inference failover isn't as big of a deal, since you typically don't need to preserve state.

If you use spot instances, choose the cheapest instance type possible, and keep your autoscalers minimum replicas to 1 (meaning it won't keep many replicas idling), you should be able to deploy the model pretty cheaply. Significantly cheaper than with SageMaker, at the very least.

There's some more info here: https://www.cortex.dev/cluster-management/spot-instances