Good question. I don't have a formal benchmark yet, but here's what I've measured in practice:
Token reduction: 64-95% depending on codebase size and work pattern. The variance is because it depends on how many files are in your .claude/ directory and how focused your session is.
How to measure yourself:
1. Check `~/.claude/attention_history.jsonl` after a session
2. Run `python3 ~/.claude/scripts/history.py --since 2h` to see what got injected vs evicted
3. Compare your token counts before/after in Claude Code's usage stats
The 25k character injection cap is the key constraint (adjustable), forces the router to prioritize ruthlessly instead of dumping everything in.
A proper benchmark comparing baseline Claude Code vs claude-cognitive on identical tasks would be useful. If someone wants to build that, I'd happily collaborate.
Token reduction: 64-95% depending on codebase size and work pattern. The variance is because it depends on how many files are in your .claude/ directory and how focused your session is.
How to measure yourself: 1. Check `~/.claude/attention_history.jsonl` after a session 2. Run `python3 ~/.claude/scripts/history.py --since 2h` to see what got injected vs evicted 3. Compare your token counts before/after in Claude Code's usage stats
The 25k character injection cap is the key constraint (adjustable), forces the router to prioritize ruthlessly instead of dumping everything in.
A proper benchmark comparing baseline Claude Code vs claude-cognitive on identical tasks would be useful. If someone wants to build that, I'd happily collaborate.