Hacker News new | ask | show | jobs
by discordance 4 days ago
Where I live prices are often higher than 20c/kWh, but lets take your example and halve it (10c/kWh) so it's ~$1.40/day or ~$500/year.

On Openrouter, the cheapest GLM 5.2 provider costs $3/MTok (at 44 tps). Assuming most use is output tokens, that's still the equivalent of 450k token/day, so we're in the same ball park, but without the capex for 2 3090's and the machine.

Self hosted only makes economic sense if your priority is being in control / avoiding surveillance.

1 comments

That's true, there's a lot of places where power is considerably more expensive than $0.20 USD/kWh. But also the 600W figure assumes that it's fully loaded 24x7x365.

Running a system that will be 600W under max CPU usage on all cores and RAM and a few 3090-class GPUs, that same system might be only 90W or around there when idle at 0.00 unix load.

If we say: (600 * 24 * 31)/1000 = 446kWh in a month at full load 24 hours a day

But it could be less, such as: (90 * 12 * 31)/1000 = 33.48 kWh of idle time in a month, and 223kWh of "full load" 600W time in a month, if it's at full load only 12 hours a day.

If you're the only user accessing it and you only "use" it 12 hours a day, that cumulative yearly dollar figure would be almost halved. Or even less if a person is using it in bursts and intermittently throughout an 8 hour workday.

> person is using it in bursts and intermittently throughout an 8 hour workday.

You can’t do that with 6 tps, though.

The usage is irrelevant if we're interested in cost per token. If you use it half as much, you get half as many tokens at half the cost. It's still $5.56 in electricity per million output tokens either way (using $0.20/kWh, adjust accordingly if you have cheaper electricity). If you use the API, you also pay half as much if you use half as much.
I think that's the biggest difference for most. If you can amortize the hardware costs, then 'burst usage' is cheaper at home to a degree, because you are paying a fixed monthly rate elsewise. Overall thought for most, it is likely cheaper to use the cloud than at home, but really depends on what you want.
> because you are paying a fixed monthly rate elsewise

No, you would pay usage based rates with API, in this case. I have exactly one fixed monthly rate for the 6 AI models I have tokens available for.

> But also the 600W figure assumes that it's fully loaded 24x7x365.

It isn't 100% efficient. Even the best PSUs aren't.

I was referring to a 600W load as measured at the wall, such as if you plugged a desktop system with a single power supply, and ordinary IEC C13/C14 to NEMA 5-15 male power cord into a kill-a-watt to measure its instantaneous wattage.

Of course the cumulative wattage of all the stuff in a x86-64 type desktop workstation is going to be a different figure than the draw from the AC wall power socket, since even the best power supplies are somewhere around 83-86% efficient in reality. Could easily be 700W, 800W, or some other figure, depending on what CPUs and GPUs a person puts in it.