| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by burtonator 672 days ago

Autoregressive models can't just resume so they have to re-parse the entire prompt again for each execution.

By caching them they resume from where it left off from before thereby completely bypassing all that computation.

For large contexts this could save a ton of compute!

I think this feature and structured outputs are some of the biggest inventions in LLMs this year.

1 comments

minimaxir 672 days ago

Prompt caching has been a thing for LLMs since GPT-2 (e.g. transformers's `use_past=True`), it's more of a surprise that it took this long for the main LLM providers to provide a good implementation.

link

brylie 672 days ago

I’m building an app with OpenAI, using structured outputs. Does OpenAI also support prompt caching?

link

cma 672 days ago

I'm sure internally they use it for the system prompt at least, probably since launch. And maybe for common initial user queries that exactly match.

link

Onavo 672 days ago

They are certainly not passing the savings on to the users.

link

minimaxir 672 days ago

Yet. I suspect OpenAI will release a similar offering soon. (hooray, free market competition!)

link

HeatrayEnjoyer 672 days ago

That $100 billion data center has to get paid for somehow.

link

minimaxir 672 days ago

Not currently.

link