| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by neonbjb 666 days ago
	You're missing the fact that requests are batched. It's 70 tokens per second for you, but also for 10s-100s of other paying customers at the same time.

1 comments

lukev 665 days ago

All these efficiencies just increase OpenAI's margin on inference. Of course it's not "one cluster per customer" and of course a customer can't saturate a cluster by themselves, my illustration was only to point out that the economics work.

link