Hacker News new | ask | show | jobs
by neonbjb 618 days ago
You're missing the fact that requests are batched. It's 70 tokens per second for you, but also for 10s-100s of other paying customers at the same time.
1 comments

All these efficiencies just increase OpenAI's margin on inference. Of course it's not "one cluster per customer" and of course a customer can't saturate a cluster by themselves, my illustration was only to point out that the economics work.