I've been read reports which suggesting that the major AI model companies aren't particularly profitable. The ones actually making money are those selling servers and providing power infrastructure. The bulk of these companies' revenue flows straight to those costs, leaving slim profit margins. That's to say nothing of services like Claude and ChatGPT with their subscription tiers — unless, of course, they happen to land a customer who pays $200 a month and barely touches the product.
Dubious at best. When you remember that there are open models with performance just as good as Sonnet that cost $2.50/M - $5/M output tokens. They don't have magically cheaper energy or compute costs. Anthropic premium pricing the API is just vibes. Like Starbucks. Like Lululemon. Like Apple.
Does this mean Claude Code $200 plan is really costing them more to run it especially for power users or they are just ripping off people with API usage? If they are just subsidizing for the $200 plan, is it just a land grab for now? and then raise prices later?
Clearly land grab, they reported billions of losses every quater. To be break even it needs to cost about 1000$ month, but then they would lose at lot of customers. Problem is they have no moat and will just burn billions of VC money to lose customers later.
If I don't need something powerful like GPT 5.5 or Claude, I could just use Deepseek, Qwen or the cheaper chinese models. I think everyone is getting smart about routing their workload to models that are cheap but good enough to fulfill the request/task and then reserving the pricier like Claude for tasks needing higher intelligence.
I hit rate-limit every other day... 5 hour... Week... I consume my 20x weekly in 3 days! So having 2x 20x!
If I could harness $10000 worth of API usage... this is the best time, no idea how long we will get this subsidy! I wont pay $10000 out of my pockets to do the same work!
TL;DR = ~$22,720 total compute @ Opus 4.7
if no caching = $113,418 (5.9x more) - this is just one month on one server... I have 3 more servers like this where I work all time!
// Generated wit Claude
Ran this on my own ~/.claude/projects/ (933 sessions, 93,842 model calls,
mix of main thread + spawned subagents). Numbers came out very close to
yours in shape, different in scale.
cost.py (Opus 4.7 list rates, main thread only):
cache reads (re-reading context) 21.69B tok $10,843 56%
cache writes (1h) 678M tok $6,781 35%
output (incl. reasoning) 63.5M tok $1,589 8%
fresh uncached input 1.6M tok $8 0%
TOTAL 22.43B tok $19,221
if no caching: $113,418 (5.9x more)
input:output ratio: 353:1
cache hit rate: 97.0%
reread_breakdown.py (per-activity share of billed input):
reasoning 59.5% (~$11.4k of the bill is re-sending old
hidden thinking back to the model)
attachments 22.6%
tool calls 7.8%
bash 3.0%
reading/web 2.4%
my prompt 1.6%
summaries 1.5%
system+tools 0.8%
subagents 0.4%
main_vs_sidecar.py:
main sidecar combined
sessions/agents 449 484 933
assistant calls 63,820 30,022 93,842
cache hit 97.0% 94.4% 96.6%
turns/agent 142 (median 20, max 11,058 in one session)
62 (median 44)
reasoning % of out 82% 51% 77%
cost @ Opus 4.7 $19,225 $3,495 $22,720
sidecar = 32% of calls but only 15% of cost. Subagents are doing
their job (cheap, focused, short context).
Same shape as yours: re-read dominates, reasoning is the biggest
re-read line, caching is the only thing keeping it sane. The one
that surprised me was a single 11,058-turn main session - some
autonomous loop I forgot to kill. Going to grep for that.
Thanks for sharing your workload. Impressive! And this one person(you) steering all this token usage across the month? I want to double click on -- if the prices were to go high, you wouldn't be spending these many tokens. Would you just delay all those projects?