|
|
|
|
|
by Aurornis
232 days ago
|
|
> In LLMs, we will have bigger weights vs test-time compute tradeoffs. A smaller model can get "there" but it will take longer. Assuming both are SOTA, a smaller model can't produce the same results as a larger model by giving it infinite time. Larger models inherently have more room for training more information into the model. No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles. I even get the larger hosted models stuck chasing their own tail and going in circles all the time. |
|
And you don't necessarily need to train all information into the model, you can also use tool calls to inject it into the context. A small model that can make lots of tool calls and process the resulting large context could obtain the same answer that a larger model would pull directly out of its weights.