Hacker News new | ask | show | jobs
by DeathArrow 58 days ago
>This model? You can run it at Q4 with 70GB of VRAM. >This beats the latest Sonnet while running locally

Not sure it will beat Sonet at Q4.

>This is approaching consumer level territory (you can get a Mac Studio with 128GB of RAM for ~3500 USD).

For $3500 I can get 7-8 years of GLM using coding plans, have a faster model and much better code quality.

3 comments

> Not sure it will beat Sonet at Q4.

Very valid. Importance-weighted quantization and TurboQuant on model weights can reduce loss a lot compared to "traditional" Q4 so one can be hopeful.

> For $3500 I can get 7-8 years of GLM using coding plans, have a faster model and much better code quality

But you will own no computer, and that's also assuming prices stay what they are. Anyway my point was not whether or not it makes financial sense for everyone. A lot of people are very happy not owning their movies, software, games, cars or house. I'm just happy there is a future where the people can own and locally run the tech that was trained on their stolen data.

@simjnd, I hate this idea but you remember how radio had been regulated to death? And how fast one will be triangulated if one decides to run a "self hosted" radio station today? My bet is in 5 years not only owning AI-inference-capable computer but using AI itself will be regulated. Essentially, we will have to scan biometrics to just ask any SOTA model to "summarise this".

Why? Because capable and free models at the dawn of AI almost made people think again and - oh oh - ask questions!

> For $3500 I can get 7-8 years of GLM using coding plans, have a faster model and much better code quality.

I know HN's distaste for crypto, but I do my inference (for personal stuff - not my employer) through Venice. I was in the airdrop for VVV, and kept as much of it staked as I could. I have ~$40/day in inference as long as that service lasts.

These days the multiplier is about 1000x last I checked; if you want $10/day in inference and can lock up $10k in VVV, you get ~$10/day in inference plus (currently) ~16% APY in the form of more VVV.

I'm not sure I'd want to invest that much if I had to today, but it's a reasonable option. The risk of VVV going to $0 seems pretty small to me.

> For $3500 I can get 7-8 years of GLM

mind sharing where's the go to place to pay for open models?

I recommend using OpenRouter (openrouter.ai). Basically a broker between inference providers and you which allows you to pick, try, and switch models from a massive catalog, extremely transparent about usage and pricing.
+5% to every API call.
I've had a decent experience with ollama cloud. It is slower than going thru openrouter but much, much cheaper -- the generosity of their $20 plan reminds me of what the Claude Code $20 plan was back in the day
You can get GLM coding plans from Z.ai and Ollama Cloud and OpenCode Go.