Show HN: Claude Code's $200 plan is a 17× subsidy on the raw API | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Claude Code's $200 plan is a 17× subsidy on the raw API (github.com)
	9 points by Hiteshjain118 29 days ago

7 comments

Alex_toani 28 days ago

I've been read reports which suggesting that the major AI model companies aren't particularly profitable. The ones actually making money are those selling servers and providing power infrastructure. The bulk of these companies' revenue flows straight to those costs, leaving slim profit margins. That's to say nothing of services like Claude and ChatGPT with their subscription tiers — unless, of course, they happen to land a customer who pays $200 a month and barely touches the product.

vagrantJin 28 days ago

Dubious at best. When you remember that there are open models with performance just as good as Sonnet that cost $2.50/M - $5/M output tokens. They don't have magically cheaper energy or compute costs. Anthropic premium pricing the API is just vibes. Like Starbucks. Like Lululemon. Like Apple.

photonair 29 days ago

Does this mean Claude Code $200 plan is really costing them more to run it especially for power users or they are just ripping off people with API usage? If they are just subsidizing for the $200 plan, is it just a land grab for now? and then raise prices later?

Lionga 29 days ago

Clearly land grab, they reported billions of losses every quater. To be break even it needs to cost about 1000$ month, but then they would lose at lot of customers. Problem is they have no moat and will just burn billions of VC money to lose customers later.

photonair 29 days ago

If I don't need something powerful like GPT 5.5 or Claude, I could just use Deepseek, Qwen or the cheaper chinese models. I think everyone is getting smart about routing their workload to models that are cheap but good enough to fulfill the request/task and then reserving the pricier like Claude for tasks needing higher intelligence.

Hiteshjain118 29 days ago

Yup, I have been testing Qwen and Kimi lately. Seeing comparable accuracy of Qwen-Instruct (not thinking) to closed source models. Here's a blog we published on that https://www.coralbricks.ai/blog/alphacumen-finance-benchmark...

mrkn1 28 days ago

A lot of my queries are summarize/explain/fact check, and these are covered 100% on my CPU locally [0], reducing frontier model reliance

[0] https://news.ycombinator.com/item?id=48301003

Hiteshjain118 28 days ago

The link in your HN is taking me to your list of Show HN posts. I wasn't able to get your github.

mrkn1 27 days ago

https://github.com/kouhxp/fftext

divyvasal 29 days ago

A 17x subsidy is crazy. It really highlights the hidden cost of context window re-evaluation in agent loops.

sibidharan 29 days ago

You mean if I use 2x $200 Max fully every week, I actually consume ~$5000 worth of API usage ?? Wow!!!

Hiteshjain118 29 days ago

My guess is you can push to $5k/month token usage with just a single Max subscription.

During my 30 days analysis window, I consumed $3371 of token and didn't hit rate limits even once.

I plan to keep pushing my token usage higher until I hit rate limits at least 5-10 times in a month.

sibidharan 29 days ago

I hit rate-limit every other day... 5 hour... Week... I consume my 20x weekly in 3 days! So having 2x 20x!

If I could harness $10000 worth of API usage... this is the best time, no idea how long we will get this subsidy! I wont pay $10000 out of my pockets to do the same work!

Hiteshjain118 29 days ago

I'm curious what your token/$ usage looks like, if you'd be willing to share :)

sibidharan 28 days ago

TL;DR = ~$22,720 total compute @ Opus 4.7 if no caching = $113,418 (5.9x more) - this is just one month on one server... I have 3 more servers like this where I work all time!

// Generated wit Claude

Ran this on my own ~/.claude/projects/ (933 sessions, 93,842 model calls, mix of main thread + spawned subagents). Numbers came out very close to yours in shape, different in scale.

cost.py (Opus 4.7 list rates, main thread only):

  cache reads (re-reading context)   21.69B tok   $10,843   56%
  cache writes (1h)                     678M tok   $6,781   35%
  output (incl. reasoning)             63.5M tok   $1,589    8%
  fresh uncached input                  1.6M tok       $8    0%
  TOTAL                              22.43B tok   $19,221

  if no caching: $113,418 (5.9x more)
  input:output ratio: 353:1
  cache hit rate: 97.0%

token_time_breakdown.py (179M unique tokens, 166h wall clock):

  reasoning (hidden thinking)   29% of tokens,  102h (61%) of time
  bash                           1.4%           23h (14%)
  writing tool calls             4%             14h  (8%)
  summaries                      2.5%            9h  (5%)
  reading/searching/web          1.6%            7h  (4%)
  subagents                      0.2%            6h  (4%)
  editing                        0.1%            5h  (3%)
  pasted attachments             25.3%           -
  typed prompt                   34.4%           -
  system+tools                   1.4%            -

reread_breakdown.py (per-activity share of billed input):

  reasoning           59.5%   (~$11.4k of the bill is re-sending old
                               hidden thinking back to the model)
  attachments         22.6%
  tool calls           7.8%
  bash                 3.0%
  reading/web          2.4%
  my prompt            1.6%
  summaries            1.5%
  system+tools         0.8%
  subagents            0.4%

main_vs_sidecar.py:

                          main         sidecar      combined
  sessions/agents          449         484           933
  assistant calls       63,820      30,022        93,842
  cache hit             97.0%        94.4%         96.6%
  turns/agent             142 (median 20, max 11,058 in one session)
                                       62 (median 44)
  reasoning % of out    82%          51%           77%
  cost @ Opus 4.7    $19,225       $3,495       $22,720

  sidecar = 32% of calls but only 15% of cost. Subagents are doing
  their job (cheap, focused, short context).

Same shape as yours: re-read dominates, reasoning is the biggest re-read line, caching is the only thing keeping it sane. The one that surprised me was a single 11,058-turn main session - some autonomous loop I forgot to kill. Going to grep for that.

Repo: github.com/Coral-Bricks-AI/coral-ai/tree/main/claude-code-token-xray

Hiteshjain118 28 days ago

Thanks for sharing your workload. Impressive! And this one person(you) steering all this token usage across the month? I want to double click on -- if the prices were to go high, you wouldn't be spending these many tokens. Would you just delay all those projects?

nbbaier 29 days ago

It's hard for me to figure out what to use all those tokens for

kpratik16 29 days ago

Interesting analysis that most token cost is due to re-reading the same context.

dnnddidiej 29 days ago

Subsidy? Sound more like raw API is just expensive.

l2s0 28 days ago

I agree with this, until we have actual info about how much is the API going to cost for the big lab, I wouldn't call it subsidy

Hiteshjain118 28 days ago

I think we would know the costs when they go public. Before that may have infer costs through various signals.