| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mmoskal 486 days ago
	Their tech report says one inference deployment is around 400 GPUs...

1 comments

You need that to optimize load balancing. Unfortunately that gain is not available to small or individual deployment.