Hacker News new | ask | show | jobs
by anthonypasq 11 hours ago
i mean this is difficult to calculate because of prompt cacheing, the ratio of input/output token etc, but if you just do some napkin math, i find it hard to believe people are getting this many tokens on a $20 plan.

heres some napkin math

gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions.

1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu)

2. you are getting 75% prompt cache reads

Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens)

Input: ~633mil

~475 mil cached at 50% input pricing = ~$9.25

~158 mil uncached = ~$6.15

tokensOutput: 25mil tokens ($4.5)

This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack.

its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin.

1 comments

I didn't say "use openrouter" as you might end using subsidized resources, part of the argument is to avoid that and reach the true capital cost of inference per token (or something like that).

I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.

heres the creator of opencode explaining how you are wrong

https://youtu.be/1VqKUrxR2C8?si=uOAs_4XNXtTyTwCP&t=2195

He's either incompetent or lying.

An H100 today costs $2.95 an hour on vast.ai[1], which is already a good deal.

gpt-oss-120b on an H100 gives you ~200-250 tokens per second. I will be generous and say you can get a million tokens an hour out of it.

OpenCode Go (which I gladly pay for, because of this in part) is $10 a month, that's three hours of H100 use, and the models you have there are more expensive than gpt-oss-120b. Sure, they have "scale" (although that doesn't apply to AI inference, but whatever) and this and that, they're still pricing it 20-30x below their minimum threshold of capital expense.

Apples to apples, GLM 5.1 they sell it to you at $4.40 per million tokens, at ~50 tps in an H100 (being generous) it costs ~$16 to do a million tokens.

The math is simple and clear, they lose money.

1: https://vast.ai/pricing