| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by YetAnotherNick 1144 days ago
	32k context is $1.92 for each request.

3 comments

nico 1144 days ago

For reference, a dev making USD 100k/year and working about 240 days a year, 8 hours/day = total of 1920 hours, or about USD 52/hour, USD 416/day

52/1.92 = 27 416/1.92 = 217

So using GPT-4 with 32k tokens, 27 times per hour, or 217 times per day, in terms of cost, is approximately the equivalent of another dev

link

ukuina 1144 days ago

FYI, 27 times per hour is basically nothing. With GPT4 over the API, I make 2-3 completion requests a minute, for 30-60 minutes at a time, when building an LLM app. This happens for 3-4 hours per day.

At the upper bound, this would be $2 * 3 * 60 * 4 = $1440 a day.

Thankfully, I am using retriever-augmentation and context stuffing into the base 4k model, so costs are manageable.

The 32k context model cannot be deployed into a production app at this pricing as a more capable drop-in replacement for shorter-context models.

link

ZephyrBlu 1144 days ago

Depends heavily on your product. I can imagine there are quite a lot of use cases that have relatively infrequent API usage or highly cacheable responses.

link

bomewish 1143 days ago

> retriever-augmentation and context stuffing

Care you elaborate? This sounds very interesting & useful. Just anything about the setup and implementation would be super helpful.

link

ukuina 1142 days ago

This should get you started: https://haystack.deepset.ai/tutorials/22_pipeline_with_promp...

link

KeplerBoy 1144 days ago

That's a lot of requests.

Not that it matters for the calculation, but i wonder how long such a request (ingesting 32k tokens and responding with a similar amount) would take.

At the speed of regular ChatGPT take would take a good while.

link

atq2119 1144 days ago

Batch processing scales quadratically with the context size (assuming OpenAI is still using standard transformer architecture) but the batch processing of the prompt is also fast compared to generating tokens because it's batched (parallel). So I wouldn't expect effective response times to go up quadratically. At most linearly, depending on the details of how they implement inference.

link

achandlerwhite 1144 days ago

Is it prorated for the actual context used for each request?

link

ZephyrBlu 1144 days ago

Yes, it's not a fixed price: https://openai.com/pricing.

Yes

It's exceedingly expensive and must surely come down over time.

link