FYI, 27 times per hour is basically nothing. With GPT4 over the API, I make 2-3 completion requests a minute, for 30-60 minutes at a time, when building an LLM app. This happens for 3-4 hours per day.
At the upper bound, this would be $2 * 3 * 60 * 4 = $1440 a day.
Thankfully, I am using retriever-augmentation and context stuffing into the base 4k model, so costs are manageable.
The 32k context model cannot be deployed into a production app at this pricing as a more capable drop-in replacement for shorter-context models.
Depends heavily on your product. I can imagine there are quite a lot of use cases that have relatively infrequent API usage or highly cacheable responses.
Batch processing scales quadratically with the context size (assuming OpenAI is still using standard transformer architecture) but the batch processing of the prompt is also fast compared to generating tokens because it's batched (parallel). So I wouldn't expect effective response times to go up quadratically. At most linearly, depending on the details of how they implement inference.
52/1.92 = 27 416/1.92 = 217
So using GPT-4 with 32k tokens, 27 times per hour, or 217 times per day, in terms of cost, is approximately the equivalent of another dev