|
|
|
|
|
by s5ma6n
590 days ago
|
|
I am puzzled why they have "asked the model" about the confidence and have not used the logprobs of the output tokens to estimate the confidence in responses. In my use case and tests, model itself is not capable of giving a reliable confidence value where logprobs almost always provide a better view on calibration. |
|
But of course that's not the way LLMs are normally used. And it precludes any sort of chain-of-thought reasoning.
For some questions, like those involving calculations, letting the model talk to itself produces much better results. For example compare https://chatgpt.com/share/67238eda-6b08-8011-8d2d-a945f78e6f... to https://chatgpt.com/share/67235a98-d2c8-8011-b2bf-53c0efabea...