Hacker News new | ask | show | jobs
by bachmeier 55 days ago
"Local inference is rarely cheaper if you’re being honest with yourself about how much you actually use it."

Sorry, but this is not even close to "being honest", it's bad math. That calculation assumes you do nothing with the computer other than local inference.

2 comments

Doesnt that calculation assume you value your privacy and owmership at zero too?
Huh, you make me curious. Let's actually do that calculation. Let's say you do actually do 24/7/365 AI use. Let's say by some miracle you can do 60 t/s on Qwen 3.6 27b, and let's say this PC cost $3000 (you should be able to do this on a DGX spark, and one of the non-Nvidia models, e.g. the Dell one. $3000 would be a good price, but not totally out of the question). And, of course, let's say these prices remain stable.

So that gets you 1_892_160_000 tokens per year at full blast.

If you go the openrouter, eh, route, you'd get charged $2 per million tokens (anywhere from $2 to $3.6 per million tokens). So the value you'd get from your machine at 100% utilization is 1892 * $2 = $3784 up to 1892 * $3.6 = $6800)

So yeah, not counting electricity and your time the machine "is worth it".

[1] https://openrouter.ai/qwen/qwen3.6-27b/providers