|
|
|
|
|
by brookst
60 days ago
|
|
My hypothesis is that people who have continuous sessions that keep the cache valid see the behavior you’re describing: at 95% cache hits (or thereabouts), the max plan goes a long way. But people who go > 5 minutes between prompts and see no cache, usage is eaten up quickly. Especially passing in hundreds of thousands of tokens of conversation history. I know my quote goes a lot further when I sit down and keep sessions active, and much less far when I’m distracted and let it sit for 10+ minutes between queries. It’s a guess. But n=1 and possible confirmation bias noted, it’s what I’m seeing. |
|