|
"On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6" This makes no sense to me. I often leave sessions idle for hours or days and use the capability to pick it back up with full context and power. The default thinking level seems more forgivable, but the churn in system prompts is something I'll need to figure out how to intentionally choose a refresh cycle. |
Normally, when you have a conversation with Claude Code, if your convo has N messages, then (N-1) messages hit prompt cache -- everything but the latest message.
The challenge is: when you let a session idle for >1 hour, when you come back to it and send a prompt, it will be a full cache miss, all N messages. We noticed that this corner case led to outsized token costs for users. In an extreme case, if you had 900k tokens in your context window, then idled for an hour, then sent a message, that would be >900k tokens written to cache all at once, which would eat up a significant % of your rate limits, especially for Pro users.
We tried a few different approaches to improve this UX:
1. Educating users on X/social
2. Adding an in-product tip to recommend running /clear when re-visiting old conversations (we shipped a few iterations of this)
3. Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.
Hope this is helpful. Happy to answer any questions if you have.