| Not everyone lives in a place where electricity is $0.20 a kWh. For instance BC Hydro residential rates are $0.11 (CAD) for the first tier and $0.14 for the second tier of consumption in a month. At current exchange rate $0.14 CAD is $0.099 USD a kWh. Hydro Quebec is even cheaper. At a theoretical 6 tok/s, 86400 seconds in a day, approx 500,000 tokens of GLM5.2 output for 2 bucks a day seems like a pretty good bargain to me. Of course not counting the one time cost of the hardware to run it. But I see people dropping $4000-5000 on all kinds of much less useful stuff. Additionally in a place where people use electric baseboard heating or electric in floor radiant heating, or really any other heating element based system in winter that's less efficient than a heat pump, additional electrical from a computing load is basically "free" since you would be spending that same money otherwise to heat your house. If a computer with 512GB of RAM is dumping the waste heat into your room, it accomplishes a portion of the same thing as a baseboard. Not to mention there is a whole other less measurable benefit of having a locally hosted model that can't be turned off or arbitrarily restricted by a service provider, and where all of your queries and context cache aren't subject to surveillance by any third party. |
On Openrouter, the cheapest GLM 5.2 provider costs $3/MTok (at 44 tps). Assuming most use is output tokens, that's still the equivalent of 450k token/day, so we're in the same ball park, but without the capex for 2 3090's and the machine.
Self hosted only makes economic sense if your priority is being in control / avoiding surveillance.