Y
Hacker News
new
|
ask
|
show
|
jobs
by
angarrido
61 days ago
must people think it’s just GPU cost. In practice it’s coordination: model latency variance + queueing + retries under load. You don’t scale linearly, you get cascading slowdowns.