Hacker News new | ask | show | jobs
by KronisLV 9 hours ago
> I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.

Presently they trail SOTA by about 6-12 months, not on par (average across everything they do).

DeepSeek V4 Pro with Max reasoning is very affordable even if you pay per-token, this month I pushed about 486 million tokens through it (I will admit that >95% was cache hits, for agentic development pretty typical) and it cost me about 8 USD in total. Meanwhile with Opus or even Sonnet if I had to pay API prices, I would be a more sad camper. That model makes a lot of stupid things though, so not ideal.

Meanwhile GLM-5.2 that came out is also quote capable and is near Opus in many tasks, all while their coding plan is more cost effective than Anthropic's: https://z.ai/subscribe

I will still stick with Anthropic but consider downgrading from Max 5x to Pro which will change the monthly expenses from around 108 EUR down to <20 EUR (they have a discount too if you pay for a year up front), and probably get the yearly GLM Pro plan which should decrease my yearly expenses from around 1300 EUR total to about 750 total EUR while still giving me a fairly decent setup.

For the consumer, that is doable and practical.

For the people actually running these models, who knows - at least DeepSeek and others are trying to make the models more efficient so the numbers are more feasible.

Also have run Qwen3.6 35B A3B on prem and it kinda sucks. Way better than models that size a year ago, but still lags behind Sonnet and also DeepSeek V4 Flash due to the size limits. Plus to even run myself I'd need a pretty beefy setup, most likely a pair of Intel Arc Pro B70s with 32 GB of VRAM each that I could still run off of my PSU but the actual model output would be kinda bullshit and I'd have to spend an unpleasant amount of time fixing it.