| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by findjashua 115 days ago
	won't this essentially disable prompt caching, that you get from a standard append-only chat history?

3 comments

Pinkert 115 days ago

That's actually a great question. and the answer is yes and no; While it does disable the caching mechanism for the conversation history (and not for the system prompt, who remains constant), there is a difference between a chatbot with a constant chat history (just exchange of messages) and an agent who uses a large part of the conversation as a type of "scratchpad", sometimes even holding variables value in the beginning of the chat (to be sort of 'stateful'). if these variables change, the scratchpad changes (can be even 30%-40% of the entire conversation), there is a timeout in the cache (Claude gives you 5 minutes of cache for normal caching) or any other change to the exact history - you get a recaching of the entire conversation. additionally, caching still costs money.

The main advantage of the librarian is that is an 'insurance policy' for this caching mechanism. combining it with solving the context rot issue - and you get improved performance at scale.

link

geoffmanning 115 days ago

oh, one other caveat is that each request could result in the curation of system messages earlier in the chat message history, i haven't done a deep dive into prompt caching, but that could complicate things. the more i think about it, the more i wonder that the prompt caching is a patch for "dumb prompting" to try to save money when you're doing things the dumb way of throwing everything you have at it and praying it gets it right, when it'd just make more sense to keep the entirety of the prompt as lean as possible to prevent context rot and maximize signal to noise ratio.

link

geoffmanning 115 days ago

that's a good point, we haven't delved too deeply into prompt caching yet, but my understanding is that it only helps for a conversation that remains "hot", not one that a user just comes back to everyday and keep adding more to it over a longer period of time. i could see some optimization there where when the conversation is "hot" we keep the system message with the summarized index and all subsequent conversation messages that haven't been summarized intact until the conversation cools off.

link