Hacker News new | ask | show | jobs
by minimaxir 672 days ago
Prompt caching has been a thing for LLMs since GPT-2 (e.g. transformers's `use_past=True`), it's more of a surprise that it took this long for the main LLM providers to provide a good implementation.
1 comments

I’m building an app with OpenAI, using structured outputs. Does OpenAI also support prompt caching?
I'm sure internally they use it for the system prompt at least, probably since launch. And maybe for common initial user queries that exactly match.
They are certainly not passing the savings on to the users.
Yet. I suspect OpenAI will release a similar offering soon. (hooray, free market competition!)
That $100 billion data center has to get paid for somehow.
Not currently.