| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by root_axis 46 days ago
	Sorry but you're just seeing what you want to see. The idea that a 31b model is anywhere even in the ballpark of something like Opus 4.5 is just absurd on its face.

3 comments

thot_experiment 46 days ago

False. The absolute capability is irrelevant, with the proper harness 31b is more than adequate for a very large portion of the tasks I ask AI to do. The metric isn't how good the model is at Erdos Problems, it's how reliably it can remove drudgery in my life. It just autonomously reverse engineered a bluetooth protocol with minimal intervention, it's ability to react to data and ground itself is constantly impressive to me. I do a ton of testing with these models, today I had Gemma answer a physics problem that Opus 4.7 gave up on. With a decent harness and context the set of tasks where their capabilities are both good enough is very surprising. The tasks I have that stump Gemma often also stump Opus 4.7.

link

diordiderot 46 days ago

Maybe reaching for an analogy would be helpful here.

Thot_experiment is saying that his 2016 Toyota Prius is a great and reliable car for his daily commute and running errands.

Whereas everyone is screeching about its capability gap with a Lockheed Martin F35 lightning.

link

thot_experiment 45 days ago

Yeah, thanks, though I think local models are at least a Cessna, which while being nothing like an F-35 can fly.

link

aceazzameen 44 days ago

Flying is fun. But shooting Cessnas out of the sky is more fun!

I'm kidding around. I run 31b models myself too and am perfectly happy with them.

link

amelius 46 days ago

This is like saying that 640kB is enough for anybody.

link

thot_experiment 46 days ago

No, it isn't. I am saying that the set of tasks that can be completed by Opus 4.7 has a surprisingly large overlap with the set of tasks that can be completed by Gemma 31B. It is meaningfully equivalent in many cases.

(of course if i'm being honest 640kB is fine, i'm sure tons of the world's commerce is handled by less for example, the delta between a system with 640kb of ram and a modern one is near nil for many people, the UX on a PoS terminal does not require more than that for example, the hacker news UX could also be roughly the same)

link

lioeters 45 days ago

> 640kB is fine

How refreshing to hear this kind of old-school hacker thinking, in a thread where most people have given up on local computing in exchange for convenience and permanent third-party dependency.

With embedded systems affordable and ubiquitous, hopefully a growing segment of the new generation will also learn to push the limit of available hardware and see how far we can take it. As an engineer there's a satisfaction in solving things with what you got.

There's a new technique, 1-bit family of language models that can achieve up to 9x memory efficiency compared to existing models. Still multiple gigabytes for practical use I imagine, but it's great progress toward local AI, which I believe will be common in the near future. https://prismml.com/news/ternary-bonsai

link

degamad 45 days ago

It's more like saying "HIMEM.SYS is not much better than 640kB".

link

BoredomIsFun 46 days ago

It would be true, if model providers did not throttle their models. I do not have definitive proof they do but the rumors are abundant.

link

creativeSlumber 45 days ago

I think you are missing the point here. what matters is for that user the local models are good enough for their use case.

link