|
|
|
|
|
by PeterisP
849 days ago
|
|
No, the standard LLM implementations currently used will apply a fixed amount of computations during inference, which is chosen and "baked in" by the model architecture before training. They don't really have the option to "think a bit more" before giving the answer, generating each token makes the exact same amount of matrix multiplications. Well, they probably theoretically could be modified to do it, but we don't do that properly yet, even if some styles of prompts e.g. "let's think step by step" kind of nudge the model in that direction. The same model will give the same result, and more processing power will simply enable you to get the inference done faster. On the other hand, more resources may enable (or be required for) a different, better model. |
|
Is it wrong to think of this as misleading? Don't the results for exactly the same request differ because there are multiple output strings with the same computed weights?
Or do you include "multiple ways to phrase the same" in "same results" and I'm being a noob?