|
|
|
|
|
by lukev
373 days ago
|
|
You can absolutely extrapolate the results, because what this shows is that even when "reasoning" these models are still fundamentally repeating in-sample patterns, and that they collapse when faced with novel reasoning tasks above a small complexity threshold. That is not a model-specific claim, it's a claim on the nature of LLMs. For your argument to be true would need to mean that there is a qualitative difference, in which some models possess "true reasoning" capability and some don't, and this test only happened to look at the latter. |
|
Furthermore we have clearly seen increases in reasoning from previous frontier models to current frontier models.
If the authors could /did show that both previous-generation and current-generation frontier models hit a wall at similar complexity that would be something, AFAIK they do not.