| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ActorNightly 3 days ago

Very false.

I use small models exclusively. They aren't a replacement for large models. You need decent hardware to run those models efficiently, as smaller parameter models plain suck and are still slow on macbooks. And affordability of higher end hardware is very limited.

Even at non VC subsidized $/token prices, its still much cheaper to run cloud based models.

3 comments

dvt 3 days ago

> Even at non VC subsidized $/token prices, its still much cheaper to run cloud based models.

On a price-per-wattage level, this is not true, people have done the math on /r/LocalLLaMA many times over[1]. Local models, while not as good as premier models (GPT 5.5, etc.), are like ~80%+ of the way there, and often converge to a similar solution after a few dead ends.

[1] https://www.reddit.com/r/LocalLLM/comments/1kshq4f/electrici...

link

fwip 3 days ago

Maybe not per watt, but unless you already happen to own a 3900 cited by that post, you'd have to buy that as well, which is currently selling for around $1400 used.

link

strictnein 3 days ago

3090s are running $1400 now? Wowsers. I thought I was overspending when I bought 6x of them for around $800 a pop.

Might be time to sell, to be honest. It's fun to have that at home, but I can't justify having $10k (with memory, mobo, cpu, etc) sitting in my basement without being fully utilized.

link

karim79 3 days ago

I'll take two of them. A thousand a piece.

link

dvt 3 days ago

I do have a 3090 Ti on my gaming PC, but even my old M1 MBP (with a mere 32gb of RAM) is quite competent and can run a quantized `Gemma4-26B-A4B` in the background while I do other stuff.

link

ActorNightly 2 days ago

The MBP running Gemma4 is absolutely is useless for any real work.

link

nozzlegear 2 days ago

What is "real work"?

link

ActorNightly 2 days ago

Where you are developing software. Its significantly faster to use google gemini and copy paste code back and forth compared to having gemini edit files for you.

link

ClikeX 2 days ago

To be fair, I can also use that 3900 for other things locally. Not just AI.

link

davnicwil 3 days ago

well to be fair that's right now, I think the question is what about in 6 months, 12 months, 2 years?

Where do these improvement curves go? Does the gap close, do they intersect for practical purposes (factoring in cost etc)? Or is the local curve always just a translation of the hosted, lagging behind, or indeed does hosted just pull ahead?

Nobody knows, but it's a very open question I feel, and it certainly appears like the answer might quite reasonably be that yes they intersect on that kind of short-ish term time horizon.

link

ActorNightly 3 days ago

>Where do these improvement curves go?

Nowhere.

Large models haven't seen that much improvement, just small unique tasks performance which is all special cased RLed to game metrics

For local models, its the same story. You can download Gemma 3 QAT from last year, and it will be just as good as Gemma:31b on the average. Qwen also boasts that its better, because again, they RLed it to game some metrics. Its better in coding then Gemma, but Gemma is better in more creative thinking (again, all RL)

Fundamentally, you need detail in the gradients for the models to pick up on the smaller details. If you don't have those, your output is gonna suck. No amount of clever architecture is going to fix this.

The only way to improve local models by training them to fetch context, and then their job becomes much simpler because all they need to do is reinterpret the fetched content and provide an answer. But fundamentally, if you are trying to keep things in house for advertising purposes like what all companies do with search, you want them to go to your service, which means running on your servers. And its not really that much extra per invocation (i.e excluding initial hardware costs) to instead just offer a large model as a service, which will be way better than any small models.

link

iwontberude 2 days ago

Just need a decent Mac Studio and they are plentiful in used condition and affordable.

link