Hacker News new | ask | show | jobs
by tony_cannistra 787 days ago
Yeah this makes sense. I do wonder though how it changes the dynamics around provisioned capacity, if at all.
1 comments

It reduces the need. If they can get non-latency sensitive users onto this API then they only need to be provisioned to support their max interactive query load (ChatGPT) rather than peak API load, which can be arbitrary high (however fast the program generating the load can run). The lower pricing should move users across quite fast, and the higher efficiency will free up hardware and reduce the rate at which they need to grow it.
That's the way it seems to me as well. Curious too about the business implications. My guess is that they wanted to bite the bullet and commit to provisioned capacity but wanted to do so in a way that didn't require massive overprovisioning for API requests.
They're well beyond that point now I guess. MS has been building whole datacenters just for OpenAI.