| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Aurornis 232 days ago

> In LLMs, we will have bigger weights vs test-time compute tradeoffs. A smaller model can get "there" but it will take longer.

Assuming both are SOTA, a smaller model can't produce the same results as a larger model by giving it infinite time. Larger models inherently have more room for training more information into the model.

No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles.

I even get the larger hosted models stuck chasing their own tail and going in circles all the time.

3 comments

yorwba 232 days ago

It's true that to train more information into the model you need more trainable parameters, but when people ask for small models, they usually mean models that run at acceptable speeds on their hardware. Techniques like mixture-of-experts allow increasing the number of trainable parameters without requiring more FLOPs, so they're large in one sense but small in another.

And you don't necessarily need to train all information into the model, you can also use tool calls to inject it into the context. A small model that can make lots of tool calls and process the resulting large context could obtain the same answer that a larger model would pull directly out of its weights.

link

naasking 231 days ago

> No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles.

That's speculative at this point. In the context of agents with external memory, this isn't so clear.

link

woctordho 232 days ago

Almost all training data are on the internet. As long as the small model has enough agentic browsing ability, given it enough time it will retrieve the data from the internet.

link