|
|
|
|
|
by westurner
372 days ago
|
|
Gemini, for example, launched implicit caching on or about 2025-05-08: https://developers.googleblog.com/en/gemini-2-5-models-now-s... : > Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount. > In order to increase the chance that your request contains a cache hit, you should keep the content at the beginning of the request the same and add things like a user's question or other additional context that might change from request to request at the end of the prompt. From https://news.ycombinator.com/item?id=43939774 re: same: > Does this make it appear that the LLM's responses converge on one answer when actually it's just caching? |
|