|
TL;DR = ~$22,720 total compute @ Opus 4.7
if no caching = $113,418 (5.9x more) - this is just one month on one server... I have 3 more servers like this where I work all time! // Generated wit Claude Ran this on my own ~/.claude/projects/ (933 sessions, 93,842 model calls,
mix of main thread + spawned subagents). Numbers came out very close to
yours in shape, different in scale. cost.py (Opus 4.7 list rates, main thread only): cache reads (re-reading context) 21.69B tok $10,843 56%
cache writes (1h) 678M tok $6,781 35%
output (incl. reasoning) 63.5M tok $1,589 8%
fresh uncached input 1.6M tok $8 0%
TOTAL 22.43B tok $19,221
if no caching: $113,418 (5.9x more)
input:output ratio: 353:1
cache hit rate: 97.0%
token_time_breakdown.py (179M unique tokens, 166h wall clock): reasoning (hidden thinking) 29% of tokens, 102h (61%) of time
bash 1.4% 23h (14%)
writing tool calls 4% 14h (8%)
summaries 2.5% 9h (5%)
reading/searching/web 1.6% 7h (4%)
subagents 0.2% 6h (4%)
editing 0.1% 5h (3%)
pasted attachments 25.3% -
typed prompt 34.4% -
system+tools 1.4% -
reread_breakdown.py (per-activity share of billed input): reasoning 59.5% (~$11.4k of the bill is re-sending old
hidden thinking back to the model)
attachments 22.6%
tool calls 7.8%
bash 3.0%
reading/web 2.4%
my prompt 1.6%
summaries 1.5%
system+tools 0.8%
subagents 0.4%
main_vs_sidecar.py: main sidecar combined
sessions/agents 449 484 933
assistant calls 63,820 30,022 93,842
cache hit 97.0% 94.4% 96.6%
turns/agent 142 (median 20, max 11,058 in one session)
62 (median 44)
reasoning % of out 82% 51% 77%
cost @ Opus 4.7 $19,225 $3,495 $22,720
sidecar = 32% of calls but only 15% of cost. Subagents are doing
their job (cheap, focused, short context).
Same shape as yours: re-read dominates, reasoning is the biggest
re-read line, caching is the only thing keeping it sane. The one
that surprised me was a single 11,058-turn main session - some
autonomous loop I forgot to kill. Going to grep for that.Repo: github.com/Coral-Bricks-AI/coral-ai/tree/main/claude-code-token-xray |