FTR, people are trying to build systems to compare LLMs with each other based on how well they are at saying "I don't know" (of course knowing is still rewarded higher): https://github.com/manyoso/haltt4llm
Would be cool to try to incorporate the previous token's confidence embedding into this process, but that would make training with a triangular attention mask not possible.