Hacker News new | ask | show | jobs
by angarrido 61 days ago
must people think it’s just GPU cost. In practice it’s coordination: model latency variance + queueing + retries under load. You don’t scale linearly, you get cascading slowdowns.