Hacker News new | ask | show | jobs
by loeg 39 days ago
> At some point, you can let a less smart model hammer at a problem for longer and get to the same result, and as long as you are not involved it comes to the same thing.

Is that true? I find the smarter models can just be effective when smaller models can't. It isn't a matter of just waiting longer.

2 comments

it's almost certainly not true yet but at some point there might be an equilibrium reached of speed Vs quality (and let's not forget, cost) where it's true for most of what you do.

Perhaps you'd still turn to hosted models for the hardest tasks, but most tasks go local. It does seem like that would make demand go down significantly.

Of course that's all predicated on model advances plateauing, or at least getting increasingly more expensive for incremental improvements, such that local open source models can catch up on that speed/quality/cost curve. But there is a fair amount of evidence that's happening. The models are still getting noticably better, but relative improvement does seem to be slowing, and cost is seemingly only going up.

Why is this presumed to be de facto inevitable:

* local compute isn’t scaling as before, so algorithmic improvements are the only ways models get meaningfully faster and smarter

* all those same algorithmic improvements would also be true for larger models

* hardware manufacturers have an incentive against local LLMs because cloud LLMs are so much more lucrative (+ corps would by desktop variants if they were good enough)

So no it’s not clear quality will ever be comparable. It may be good enough for what you want but there will always be a harder problem that you need to throw more compute and more memory at.

> It may be good enough for what you want but there will always be a harder problem that you need to throw more compute and more memory at.

Sure, but if the “good enough for what you want” consumes the vast majority of cases - data-center ai becomes just for the very extreme edge cases. Like how I can render a 4k rez video game at 60fps on my home pc, but if pixar wants to render their next movie they use data-center compute.

> all those same algorithmic improvements would also be true for larger models

Smaller models run faster. If ten runs of a small model gets me the same quality result as one run of the big model, and the small model runs 10x faster, then they are functionally the same.

Even accepting the premise, it should be obviously true that 10 dumber models running 10x as fast != 1 smarter model. Otherwise engineering would just be a matter of throwing people at a problem when it’s very clear that 1 talented engineer can outperform a team of engineers or accomplish things the team would never have been able to. There’s also the assumption you’re making that a 10x smaller model is 10x dumber when it’s not - it’s a curve and some people seem to struggle with non linear effects
> it should be obviously true that 10 dumber models running 10x as fast != 1 smarter model

If a smaller model tries ten things and comes to the same conclusion as the big model gets first try, then yeah 10x small = 1x big. Is that where we are at now? Idk probably not - but it’s not hard to imagine something like that emerging soon. There is already evidence that smaller models get some things _better_ than bigger models (e.g. https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag... )

> There’s also the assumption you’re making that a 10x smaller model is 10x dumber when it’s not

That is not an assumption i am making. I said “a smaller model” not “a 10x smaller model”. Model speed and model “intelligence” are both non-linear.

> Like how I can render a 4k rez video game at 60fps on my home pc, but if pixar wants to render their next movie they use data-center compute.

This is a very nice analogy actually and it impacts the whole story about US vs. Chinese leadership in "frontier AI".

I think you're correct with the standard thinking approach (just generate a big stream of tokens before drafting your actual answer). After a while, additional thinking just results in loops.

The RSA approach from https://rsa-llm.github.io/, expanded on by https://www.zyphra.com/post/zaya1-8b, looks like a promising way to squeeze a bit more intelligence from a small model. As I understand it, running multiple independent thinking traces in parallel gives you a chance of one of them finding a different local optimum, whereas running a single trace for longer is likely to just circle around one optimum.

That said, at the end of the day, there's only so much information a small model can contain. If a model just doesn't know some key piece of information, no amount of thinking will help it figure out a solution that depends on that information.