Hacker News new | ask | show | jobs
by whatever1 944 days ago
The thing is that a LLMs can point out a logic error in their reasoning if specifically asked to do so.

So maybe OpenAI just slapped an RL agent on top of the next-token generator.