|
|
|
|
|
by kurige
55 days ago
|
|
> This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact. I see this repeated by others, including coworkers. It completely ignores caching. Caching itself is complicated, but the "longer context window = more expensive" is not 100% true and you are hampering yourself if you're not taking full advantage of large context windows. |
|
The default Claude cache expires in 5 minutes. If you take a short break to review the code, talk to someone, or do anything other than continuously interact with the session it's going to get evicted and start over.
You can opt in to a 1-hour cache at a higher rate https://platform.claude.com/docs/en/build-with-claude/prompt...
Also anecdotally, caching has just been broken at times for me. I've had active conversations where turns less than 5 minutes apart were consuming so much quota that I doubt anything was being billed at the cache rate.