| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kaliades 61 days ago

Exact match, word for word. agent-cache takes everything that defines an LLM request - which model you're calling (gpt-4o, Claude, etc.), the full conversation history (system prompt + user messages + assistant responses), sampling parameters like temperature, and any tool/function definitions the model has access to - serializes it all into a canonical JSON string with sorted keys, and hashes it with SHA-256. That hash is the cache key in Valkey. Same inputs down to the last character = cache hit, anything different = miss.

If you want the 'basically the same question' behavior, that's our other package - @betterdb/semantic-cache. It embeds the prompt as a vector and does similarity search, so 'What is the capital of France?' and 'Capital city of France?' both hit. The trade-off is it needs valkey-search for the vector index, while agent-cache works on completely vanilla Valkey with no modules.

In practice, agent-cache hits its cache less often than semantic-cache would, but when it does hit, you know the result is correct - there's no chance of returning a response for a question that was similar but not actually the same.

1 comments

traceseal 60 days ago

Do you automate the same prompt twice and if you already burn the tokens to get the first answer what the second answer gain do you record it to try and teach your agents for future to look locally or?

link