|
|
|
|
|
by calebkaiser
1996 days ago
|
|
The price of GPU inference can be brutal, but there's a lot you can do on the infra side to improve it: - Spot instances - Aggressive autoscaling - Micro batching Can reduce inference compute spend by huge amounts (90% is not uncommon). ML, especially anything involving realtime inference, is an area where effective platform engineering makes a ridiculous difference even in the earliest days. Source: I help maintain open source ML infra for GPU inference and think about compute spend way too much https://github.com/cortexlabs/cortex |
|