| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by howinator 316 days ago
	I could be wrong, but I think this pricing is the first to admit that cost scales quadratically with number of tokens. It’s the first time I’ve seen nonlinear pricing from an LLM provider which implicitly mirrors the inference scaling laws I think we're all aware of.

2 comments

jpau 316 days ago

Google[1] also has a "long context" pricing structure. OpenAI may be considering offering similar since they do not offer their priority processing SLAs[2] for context >128K.

[1] https://cloud.google.com/vertex-ai/generative-ai/pricing

[2] https://openai.com/api-priority-processing/

link

energy123 316 days ago

Is this marginal pricing or if you go from 200,000 to 200,001 tokens your total costs double?

link