Hacker News new | ask | show | jobs
by nok22kon 3 hours ago
if you do the electricity math you'll see that you pay more on local models while getting less (local is more heavily quantized) compared with OpenRouter.

I'm not talking local Gemma/Qwen vs cloud Opus, but against OpenRouter same Gemma/Qwen

there are reasons to run local - privacy, availability, but cost is not one of them

2 comments

That's assuming consumption pricing remains as-is.

There has been a lot of market-subsidy in AI which is starting to fade away: e.g. the copilot quotas/pricing. When VC switches from investing to wanting a return, the price equation is likely to change.

There is no subsidy on most OpenRouter providers, they are profitable today.

You buy a big GPU, you serve LLMs, you print money.

> do the electricity math

Could you give an example with real figures?

obviously depends on your location and GPU

for me it would be about $2 per day in electricity to generate 8 mil tok of Gemma4-26B at 4 bit quantization. this is excluding how much the GPU cost (no amortization)

ignoring the fact that I could get more free tokens per day for this model from Google/OpenRouter, it would cost $4 per day on OpenRouter if paid, but they would run it at full 16 bit precission

this would be the most "profitable" model for me

for Gemma4-31B I can generate only 1 mil tok per day, and so I pay more to get less quality than OpenRouter (ignoring that this model is also free on Google)