| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by blcknight 64 days ago
	There is a cognitive ceiling for what you can do with smaller models. Animals with simpler neural pathways often outperform whatever think they are capable of but there's no substitute for scale. I don't think you'll ever get a 4B or 8B model equivalent to Opus 4.6. Maybe just for coding tasks but certainly not Opus' breadth.

3 comments

zarzavat 64 days ago

The only thing that we are sure can't be highly compressed is knowledge, because you can only fit so much information in given entropy budget without losing fidelity.

The minimal size limits of reasoning abilities are not clear at all. It could be that you don't need all that many parameters. In which case the door is open for small focused models to converge to parity with larger models in reasoning ability.

If that happens we may end up with people using small local models most of the time, and only calling out to large models when they actually need the extra knowledge.

link

idle_zealot 64 days ago

> and only calling out to large models when they actually need the extra knowledge

When would you want lossy encoding of lots of data bundled together with your reasoning? If it is true that reasoning can be done efficiently with fewer parameters it seems like you would always want it operating normal data searching and retrieval tools to access knowledge rather than risk hallucination.

And re: this discussion of large data centers versus local models, do recall that we already know it's possible to make a pretty darn clever reasoning model that's small and portable and made out of meat.

link

Gareth321 63 days ago

I find it difficult to understand the distinction between parametric knowledge and reasoning skills in LLMs. I still think of them as distinct but I understand there is significantly overlap. Arguably, they are the same thing in LLMs. So I would assume that if reasoning is high quality, using RAG could be logical (if much slower). However if the lack of parametric knowledge impacts reasoning, then use of larger models seems warranted. A dumb LLM wouldn't offer sufficient results even with all the RAG in the world.

link

aldonius 63 days ago

I guess we can imagine a pure reasoning model (if that's even the right word any more) with almost zero world-knowledge. How does it know what to look for? How does it do any meaningful communication at all?

So I think it's useful to have an imprecise-but-fairly-accurate set of world knowledge as part of an otherwise reasoning-heavy model. It's a cache.

And if the it's an LLM, or something like that, I think it basically has to have world-knowledge built in, because what is natural language if not communication about the world?

link

dryarzeg 64 days ago

> we already know it's possible to make a pretty darn clever reasoning model

There's is a problem though: we know that it is possible, but we don't know how to (at least not yet and as far as I am aware). So we know the answer to "what?" question, but we don't know the answer to "how?" question.

link

adrianN 64 days ago

I would call brains with the needed support infrastructure small.

link

yorwba 64 days ago

I think you underestimate the amount of knowledge needed to deal with the complexities of language in general as opposed to specific applications. We had algorithms to do complex mathematical reasoning before we had LLMs, the drawback being that they require input in restricted formal languages. Removing that restriction is what LLMs brought to the table.

Once the difficult problem of figuring out what the input is supposed to mean was somewhat solved, bolting on reasoning was easy in comparison. It basically fell out with just a bit of prompting, "let's think step by step."

If you want to remove that knowledge to shrink the model, we're back to contorting our input into a restricted language to get the output we want, i.e. programming.

link

charcircuit 64 days ago

I think you are underestimating the strength a small model can get from tool use. There may be no substitute for scale, but that scale can live outside of the model and be queried using tools.

In the worst case a smaller model could use a tool that involves a bigger model to do something.

link

sroussey 64 days ago

Small models are bad at tool use. I have liquidai doing it in the browser but it’s super fragile.

link

dd8601fn 64 days ago

I don’t really understand this, but I hear it a lot so I know it’s just confusion on my part.

I’m running little models on a laptop. I have a custom tool service made available to a simple little agent that uses the small models (I’ve used a few). It’s able to search for necessary tool functions and execute them, just fine.

My biggest problem has been the llm choosing not to use tools at all, favoring its ability to guess with training data. And once in a while those guesses are junk.

Is that the problem people refer to when they say that small models have problems with tool use? Or is it something bigger that I wouldn’t have run into yet?

link

dathinab 64 days ago

except you don't want knowledge in the model, and most of that "size" comes from "encoded knowledge", i.e. over fitting. The goal should be to only have language handling in the model, and the knowledge in a database you can actually update, analyze etc. It's just really hard to do so.

"world models" (for cars) maybe make sense for self driving, but they are also just a crude workaround to have a physics simulation to push understanding of physics. Through in difference to most topics, basic, physics tend to not change randomly and it's based on observation of reality, so it probably can work.

Law, health advice, programming stuff etc. on the other hand changes all the time and is all based on what humans wrote about it. Which in some areas (e.g. law or health) is very commonly outdated, wrong or at least incomplete in a dangerous way. And for programming changes all the time.

Having this separation of language processing and knowledge sources is ... hard, language is messy and often interleaves with information.

But this is most likely achievable with smaller models. Actually it might even be easier with a small model. (Through if the necessary knowledge bases are achievable to fit on run on a mac is another topic...)

And this should be the goal of AI companies, as it's the only long term sustainable approach as far as I can tell.

I say should because it may not be, because if they solve it that way and someone manages to clone their success then they lose all their moat for specialized areas as people can create knowledge bases for those areas with know-how OpenAI simple doesn't have access to. (Which would be a preferable outcome as it means actual competition and a potential fair working market.)

link

dathinab 64 days ago

as a concrete outdated case:

TLS cipher X25519MLKEM768 is recommended to be enabled on servers which do support it

last time I checked AI didn't even list it when you asked it for a list of TLS 1.3 ciphers (through it has been widely supported since even before it was fully standardized..)

this isn't surprising as most input sources AI can use for training are outdated and also don't list it

maybe someone of OpenAI will spot this and feet it explicitly into the next training cycle, or people will cover it more and through this it is feed implicitly there

but what about all that many niche but important information with just a handful of outdated stack overflow posts or similar? (which are unlikely to get updated now that everyone uses AI instead..)

The current "lets just train bigger models with more encoded data approach" just doesn't work, it can get you quite far, tho. But then hits a ceiling. And trying to fix it by giving it also additional knowledge "it can ask if it doesn't know" has so far not worked because it reliably doesn't realize it doesn't know if it has enough outdated/incomplete/wrong information encoded in the model. Only by assuring it doesn't have any specialized domain knowledge can you make sure that approach works IMHO.

link