| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rafaelero 849 days ago
	You are incorrect. Increasing compute during inference renders similar gains to increasing parameters/compute during training time (see self-consistency, tree of thoughts, etc.)

2 comments

Lerc 849 days ago

Can you elaborate upon that? Apart from the multiplication and accumulations of activations and weights what additional computations can be applied to improve the outputs.

I think it has already been implied that we are not talking about increasing the quantity of parameters in this context but the possibily of applying additional compute to a model with a given number of parameters

link

rafaelero 849 days ago

You can train a smaller model and run inference multiple times and it will reach similar performance as a larger model running inference just once. What's the best way to make use of those multiple inferences is still up to debate, but we already know it works (self-consistency is one example).

link

frannyg 848 days ago

I wasn't able to elaborate on what I mean with "better" when I asked the question but the idea can indeed be summarized with "will an LLM increase quantity and quality of parameters if you give it more processing power and time". Now I know that language models don't do that at all and that the weights of the user request stored in the "frozen" training data is what assembles the return after generating possible output strings, which are selected by pre-prompts like asking for chain of thought and reasoning paths and so on, which in the end, are nothing more than more weights pulling in more specific context. (I'm just thinking out loud here)

link

frannyg 848 days ago

Yeah, I totally forgot about training time and time of request (aaah, inference time! now I get it.) being completely different points in time because the LLM has no access to the training data anymore.

link