| > Do submarines swim? It doesn't matter HOW LLMs "swim" as long as they can, but the point being raised is whether they actually can. It's as if LLMs can swim in the ocean, in rough surf, but fail to swim in rivers or swimming pools, because they don't have a generalized ability to swim - they've just been RL-trained on the solution steps to swimming in surf, but since those exact conditions don't exist in a river (which might seem like a less challenging environment), they fail there. So, the question that might be asked is when LLMs are trained to perform well in these vertical domains like math and programming, where it's easy to verify results and provide outcome- or process-based RL rewards, are they really learning to reason, or are they just learning to pattern match to steer generation in the direction of problem-specific reasoning steps that they had been trained on? Does the LLM have the capability to reason/swim, or is it really just an expert system that has been given the rules to reason/swim in certain cases, but would need to be similarly hand fed the reasoning steps to be successful in other cases? I think the answer is pretty obvious given that LLM's can't learn at runtime - can't try out some reasoning generalization they may have arrived at, find that it doesn't work in a specific case, then explore the problem and figure it out for next time. Given that it's Demis Hassabis who it pointing out this deficiency of LLMs (and has a 5-10 year plan/timeline to fix it - AGI), not some ill-informed LLM critic, it seems silly to deny it. |
>It doesn't matter HOW LLMs "swim" as long as they can, but the point being raised is whether they actually can.
>It's as if LLMs can swim in the ocean, in rough surf, but fail to swim in rivers or swimming pools
Just like submarines!