Hacker News new | ask | show | jobs
by moralestapia 12 hours ago
I didn't say "use openrouter" as you might end using subsidized resources, part of the argument is to avoid that and reach the true capital cost of inference per token (or something like that).

I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.

1 comments

heres the creator of opencode explaining how you are wrong

https://youtu.be/1VqKUrxR2C8?si=uOAs_4XNXtTyTwCP&t=2195

He's either incompetent or lying.

An H100 today costs $2.95 an hour on vast.ai[1], which is already a good deal.

gpt-oss-120b on an H100 gives you ~200-250 tokens per second. I will be generous and say you can get a million tokens an hour out of it.

OpenCode Go (which I gladly pay for, because of this in part) is $10 a month, that's three hours of H100 use, and the models you have there are more expensive than gpt-oss-120b. Sure, they have "scale" (although that doesn't apply to AI inference, but whatever) and this and that, they're still pricing it 20-30x below their minimum threshold of capital expense.

Apples to apples, GLM 5.1 they sell it to you at $4.40 per million tokens, at ~50 tps in an H100 (being generous) it costs ~$16 to do a million tokens.

The math is simple and clear, they lose money.

1: https://vast.ai/pricing