|
|
|
|
|
by TZubiri
559 days ago
|
|
The less tokens produced at inference the lower the quality of the response will be. The process of thinking for an LLM involves the use of words, which is why prompts that ask the LLM to only return the answer will cause lower quality. |
|
In general, a model has to learn to positively say "I don't know" instead of "I don't know" being in the negative space of tokens falling into a weak distribution. The softmax selector also normalizes the token logits, so if no options are any good (all next tokens suck) it could pick randomly from a bunch of bad choices, which then locks the model into a continuation based off of that first bad choice.