| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by angarrido 108 days ago
	must people think it’s just GPU cost. In practice it’s coordination: model latency variance + queueing + retries under load. You don’t scale linearly, you get cascading slowdowns.