Hacker News new | ask | show | jobs
by eru 1157 days ago
I think you are mixing up layers of abstraction.

The network is most likely trained with something like a categorical cross entropy loss function. Those totally punish being wrong a lot more than saying "I don't know". See https://www.v7labs.com/blog/cross-entropy-loss-guide

It's just that saying "I don't know" means that your model is spreading the probability of what the next token in the text stream might be over many different outcomes. A very 'uniform' probability distribution, instead of sharp prediction.

That looks very different to GPT literally outputting the words "I don't know".

1 comments

Sorry if I was unclear. I know that the model is incentivised to accurately predict the probability distribution of the next token. I mean that the model is not being incentivised to literally produce the output tokens corresponding to "I don't know" when asked a question where it is uncertain.
Yes, exactly.

What I wanted to emphasize is that the training _does_ actually incentivize the model to say "I don't know" but on a lower level.

If only the OpenAI api gave us the token probabilities like it used to.