Hacker News new | ask | show | jobs
by mmoskal 486 days ago
Their tech report says one inference deployment is around 400 GPUs...
1 comments

You need that to optimize load balancing. Unfortunately that gain is not available to small or individual deployment.