|
|
|
|
|
by billti
428 days ago
|
|
If it’s predicting a next token to maximize scores against a training/test set, naively, wouldn’t that be expected? I would imagine very little of the training data consists of a question followed by an answer of “I don’t know”, thus making it statistically very unlikely as a “next token”. |
|
One could imagine a fine tuning procedure that gave a model better knowledge of itself by testing it and on prompts where its most probable completions are wrong fine tune it to say "I don't know" instead. Though the 'are wrong' is doing some really heavy lifting since it wouldn't be simple to do that without a better model that knew the right answers.