CLAUDE.md is static same content every session (and a soft 40k character limit)
This is dynamic attention routing. Files get scored based on what you're actively discussing. mention "auth" and auth-related docs go HOT (full injection), related files go WARM (headers only), unrelated files stay COLD (evicted).
Scores decay over turns. If you stop talking about auth, those files fade back to COLD automatically.
Plus multi-instance coordination, concurrent Claude sessions sharing completions and blockers so they don't duplicate work.
25k character limit on injection, so you compact less and stay focused where its needed. Ive also seen it help alot with the post compacting context wobble that occurs.
Good question. I don't have a formal benchmark yet, but here's what I've measured in practice:
Token reduction: 64-95% depending on codebase size and work pattern. The variance is because it depends on how many files are in your .claude/ directory and how focused your session is.
How to measure yourself:
1. Check `~/.claude/attention_history.jsonl` after a session
2. Run `python3 ~/.claude/scripts/history.py --since 2h` to see what got injected vs evicted
3. Compare your token counts before/after in Claude Code's usage stats
The 25k character injection cap is the key constraint (adjustable), forces the router to prioritize ruthlessly instead of dumping everything in.
A proper benchmark comparing baseline Claude Code vs claude-cognitive on identical tasks would be useful. If someone wants to build that, I'd happily collaborate.
This is dynamic attention routing. Files get scored based on what you're actively discussing. mention "auth" and auth-related docs go HOT (full injection), related files go WARM (headers only), unrelated files stay COLD (evicted).
Scores decay over turns. If you stop talking about auth, those files fade back to COLD automatically.
Plus multi-instance coordination, concurrent Claude sessions sharing completions and blockers so they don't duplicate work.
25k character limit on injection, so you compact less and stay focused where its needed. Ive also seen it help alot with the post compacting context wobble that occurs.