Hacker News new | ask | show | jobs
by rafaelero 849 days ago
You are incorrect. Increasing compute during inference renders similar gains to increasing parameters/compute during training time (see self-consistency, tree of thoughts, etc.)
2 comments

Can you elaborate upon that? Apart from the multiplication and accumulations of activations and weights what additional computations can be applied to improve the outputs.

I think it has already been implied that we are not talking about increasing the quantity of parameters in this context but the possibily of applying additional compute to a model with a given number of parameters

You can train a smaller model and run inference multiple times and it will reach similar performance as a larger model running inference just once. What's the best way to make use of those multiple inferences is still up to debate, but we already know it works (self-consistency is one example).
I wasn't able to elaborate on what I mean with "better" when I asked the question but the idea can indeed be summarized with "will an LLM increase quantity and quality of parameters if you give it more processing power and time". Now I know that language models don't do that at all and that the weights of the user request stored in the "frozen" training data is what assembles the return after generating possible output strings, which are selected by pre-prompts like asking for chain of thought and reasoning paths and so on, which in the end, are nothing more than more weights pulling in more specific context. (I'm just thinking out loud here)
Yeah, I totally forgot about training time and time of request (aaah, inference time! now I get it.) being completely different points in time because the LLM has no access to the training data anymore.