|
|
|
|
|
by calebkaiser
2165 days ago
|
|
I maintain an open source ML infra project, where we've spent a ton of time on cost optimization for running GPU-intensive ML models, specifically on AWS: https://github.com/cortexlabs/cortex If you've done zero optimization so far, there is likely some real low-hanging fruit: 1. If GPU instances are running up a huge EC2 bill, switch to spot instances (a g4dn.xlarge spot is $0.1578/hr in US West (Oregon) vs $0.526/hr on demand). 2. If inference costs are high, look into Inferentia ( https://docs.cortex.dev/deployments/inferentia ). For certain models, we've benchmarked over 4x improvements in efficiency. Additionally, autoscaling more conservatively and leveraging batch prediction wherever possible can make a real dent. 3. Finally, and likely the lowest hanging fruit of all, talk to your AWS rep. If your situation is dire, there's a very good chance they'll throw some credits your way while you figure things out. If you're interested in trying Cortex out, AI Dungeon wrote a piece on how they used it to bring their spend down ~90%. For context, they serve a 5 GB GPT-2 model to thousands of players every day: https://medium.com/@aidungeon/how-we-scaled-ai-dungeon-2-to-... |
|