Y
Hacker News
new
|
ask
|
show
|
jobs
by
mmoskal
486 days ago
Their tech report says one inference deployment is around 400 GPUs...
1 comments
fspeech
485 days ago
You need that to optimize load balancing. Unfortunately that gain is not available to small or individual deployment.
link