Hacker News new | ask | show | jobs
by iamjackg 152 days ago
It's not unsolved, at least not the first part of your question. In fact it is a feature offered by all main LLM providers!

- https://platform.openai.com/docs/guides/prompt-caching

- https://platform.claude.com/docs/en/build-with-claude/prompt...

- https://ai.google.dev/gemini-api/docs/caching

2 comments

Ah, that's good to know, thanks.

But then why is there compounding token usage in the article's trivial solution? Is it just a matter of using the cache correctly?

Cached tokens are cheaper (90% discount ish) but not free
Also, unlike OpenAI, Anthropic's prompt caching is explicit (you set up to 4 cache "breakpoints"), meaning if you don't implement caching then you don't benefit from it.
thats a very generous way of putting it. Anthropic's prompt caching is actively hostile and very difficult to implement properly.
dumb question, but is prompt caching available to Claude Code … ?
If you're using the API, yes. If you have a subscription, you don't care, as you aren't billed per prompt (you just have a limit).