|
|
|
|
|
by mike_hearn
793 days ago
|
|
It reduces the need. If they can get non-latency sensitive users onto this API then they only need to be provisioned to support their max interactive query load (ChatGPT) rather than peak API load, which can be arbitrary high (however fast the program generating the load can run). The lower pricing should move users across quite fast, and the higher efficiency will free up hardware and reduce the rate at which they need to grow it. |
|