| [DISCLAIMER] I work at AWS, not speaking for my employer. We really need some more details on your infrastructure, but I assume it's EC2 instance cost that skyrocketed? A couple of pointers: - Experiment with different GPU instance types. - Try Inferentia [1], a dedicated ML chip. Most popular ML frameworks are supported by the Neuron compiler. Assuming you manage your instances in an auto scaling group (ASG): - Enable a target tracking scaling policy to reactively scale your fleet. The best scaling metric depends on your inference workload. - If your workload is predictable (e.g. high traffic during the daytime, low traffic during nighttime), enable predictive scaling. [3] [1] https://aws.amazon.com/machine-learning/inferentia/ [2] https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-sca... [3] https://docs.aws.amazon.com/autoscaling/plans/userguide/how-... |