|
|
|
|
|
by nmfisher
1996 days ago
|
|
From the about section: > How much does maintaining the servers cost?
> It depends on the amount of traffic, but the minimum baseline is around several thousands of US dollars every month. This is expected as inference is very GPU intensive and a sufficient number of instances need to be spun up to handle thousands of requests coming in every minute. Everything is paid out of pocket. Wow, impressive commitment for something that's free. |
|
- Spot instances
- Aggressive autoscaling
- Micro batching
Can reduce inference compute spend by huge amounts (90% is not uncommon). ML, especially anything involving realtime inference, is an area where effective platform engineering makes a ridiculous difference even in the earliest days.
Source: I help maintain open source ML infra for GPU inference and think about compute spend way too much https://github.com/cortexlabs/cortex