> Deducing things from the inability of an LLM to answer a specific question seemed doomed by the "it will be able to on the next itteration" principle.
That's orthogonal.If we are pointing in the right direction(s) then yes, next iteration could resolve all problems. If we are not pointing in the right direction(s) then no, next iteration will not resolve these problems. Given LLMs rapid improvement in regurgitating knowledge from their training data but simultaneously slow improvement in their ability to generalize (such as logic "puzzles"), I think it is naive to assume we're pointed in the right direction. Maybe we're even pointing in mostly the right direction. But why assume we are? We can continue in the direction we are going while simultaneously considering it might not be well aligned. If we are well aligned, that gives us more confidence and makes gathering funding easier. If we aren't, well it is easier to course correct sooner than later. In either case, you benefit from the analysis. Understanding why things fail is more important than understanding why things succeed. |
Thus observers of the LLM space like us need to keep finding novel “Bellweather problems” that we think will evaluate a model’s ability to reason, knowing that once we start talking about it openly the problem will no longer be a useful Bellweather.
By their nature as “weird-shaped” problems, these aren’t the kind of thing we’re guaranteed to have an infinite supply of. As the generations move on it will become more and more difficult to discern “actual improvements in reasoning” from “the model essentially has the solution to your particular riddle hard-coded”.