|
I also hope we have something like this. But sadly, this is not going to work. The reason is this line from the article, which is so much harder that it looks: > and a critic model filters the results for genuinely valuable ideas. In fact, people have tryied this idea. And if you use a LLM or anything similar as the critic, the performance of the model actually degrades in this process. As the LLM tries too hard to satisfy the critic, and the critic itself is far from a good reasoner. So the reason that we don't hear too much about this idea is not that nobody tried it. But that they tried, and it didn't work, and people are reluctant to publish about something which does not work. |
This not only affects a potential critic model, but the entire concept of a "reasoning" model is based on the same flawed idea—that the model can generate intermediate context to improve its final output. If that self-generated context contains hallucinations, baseless assumptions or doubt, the final output can only be an amalgamation of that. I've seen the "thinking" output arrive at a correct solution in the first few steps, but then talk itself out of it later. Or go into logical loops, without actually arriving at anything.
The reason why "reasoning" models tend to perform better is simply due to larger scale and better training data. There's nothing inherently better about them. There's nothing intelligent either, but that's a separate discussion.