TL;DR = ~$22,720 total compute @ Opus 4.7
if no caching = $113,418 (5.9x more) - this is just one month on one server... I have 3 more servers like this where I work all time!
// Generated wit Claude
Ran this on my own ~/.claude/projects/ (933 sessions, 93,842 model calls,
mix of main thread + spawned subagents). Numbers came out very close to
yours in shape, different in scale.
cost.py (Opus 4.7 list rates, main thread only):
cache reads (re-reading context) 21.69B tok $10,843 56%
cache writes (1h) 678M tok $6,781 35%
output (incl. reasoning) 63.5M tok $1,589 8%
fresh uncached input 1.6M tok $8 0%
TOTAL 22.43B tok $19,221
if no caching: $113,418 (5.9x more)
input:output ratio: 353:1
cache hit rate: 97.0%
reread_breakdown.py (per-activity share of billed input):
reasoning 59.5% (~$11.4k of the bill is re-sending old
hidden thinking back to the model)
attachments 22.6%
tool calls 7.8%
bash 3.0%
reading/web 2.4%
my prompt 1.6%
summaries 1.5%
system+tools 0.8%
subagents 0.4%
main_vs_sidecar.py:
main sidecar combined
sessions/agents 449 484 933
assistant calls 63,820 30,022 93,842
cache hit 97.0% 94.4% 96.6%
turns/agent 142 (median 20, max 11,058 in one session)
62 (median 44)
reasoning % of out 82% 51% 77%
cost @ Opus 4.7 $19,225 $3,495 $22,720
sidecar = 32% of calls but only 15% of cost. Subagents are doing
their job (cheap, focused, short context).
Same shape as yours: re-read dominates, reasoning is the biggest
re-read line, caching is the only thing keeping it sane. The one
that surprised me was a single 11,058-turn main session - some
autonomous loop I forgot to kill. Going to grep for that.
Thanks for sharing your workload. Impressive! And this one person(you) steering all this token usage across the month? I want to double click on -- if the prices were to go high, you wouldn't be spending these many tokens. Would you just delay all those projects?
Its just me steering all tokens... Multiple projects for my bootstrapped Academy, running parallel researches on things I wouldn't do myself, and deriving the right patterns for the future. Using different Max on same machine. I have similar ~$22K on another machine, where I am working in parallel with the same 2xMax subscriptions!
So if the prices were to go high, I honestly have no idea at the moment. I might need a rehabilitation centre!!!
But I am mentally prepared that someday this will be gone... So make hay while the sun shines! I am doing the most complex of all works I want to do... And pushing the limits. Knowing how much it costs in actual tokens, I give me some kind of seriousness, like an invisible funding that I must use to grow!! Considering this like a launchpad... If any of the projects hit and scale and I get some funding in future, then I might not need to worry when prices go high. So either this or rehab lol !
// Generated wit Claude
Ran this on my own ~/.claude/projects/ (933 sessions, 93,842 model calls, mix of main thread + spawned subagents). Numbers came out very close to yours in shape, different in scale.
cost.py (Opus 4.7 list rates, main thread only):
token_time_breakdown.py (179M unique tokens, 166h wall clock): reread_breakdown.py (per-activity share of billed input): main_vs_sidecar.py: Same shape as yours: re-read dominates, reasoning is the biggest re-read line, caching is the only thing keeping it sane. The one that surprised me was a single 11,058-turn main session - some autonomous loop I forgot to kill. Going to grep for that.Repo: github.com/Coral-Bricks-AI/coral-ai/tree/main/claude-code-token-xray