|
As an PhD student in NLP who's graduating soon, my perspective is that language models do not demonstrate "reasoning" in the way most people colloquially use the term. These models have no capacity to plan ahead, which is a requirement for many "reasoning" problems. If it's not in the context, the model is unlikely to use it for predicting the next token. That's why techniques like chain-of-thought are popular; they cause the model to parrot a list of facts before making a decision. This increases the likelihood that the context might contain parts of the answer. Unfortunately, this means the "reasoning" exhibited by language models is limited: if the training data does not contain a set of generalizable text applicable to a particular domain, a language model is unlikely to make a correct inference when confronted with a novel version of a similar situation. That said, I do think adding reasoning capabilities is an active area of research, but we don't have a clear time horizon on when that might happen. Current prompting approaches are stopgaps until research identifies a promising approach for developing reasoning, e.g. combining latent space representations with planning algorithms over knowledge bases, constraining the logits based on an external knowledge verifier, etc (these are just random ideas, not saying they are what people are working on, rather are examples of possible approaches to the problem). In my opinion, language models have been good enough since the GPT-2 era, but have been held back by a lack of reasoning and efficient memory. Making the language models larger and trained on more data helps make them more useful by incorporating more facts with increased computational capacity, but the approach is fundamentally a dead end for higher level reasoning capability. |
I'm curious where you are drawing your definition or scope for 'reasoning' from?
For example, in Shuren The Neurology of Reasoning (2002) the definition selected was "the ability to draw conclusions from given information."
While I agree that LLMs can only process token to token and that juggling context is critical to effective operation such that CoT or ToT approaches are necessary to maximize the ability to synthesize conclusions, I'm not quite sure what the definition of reasoning you have in mind is such that these capabilities fall outside of it.
The typical lay audience suggestion that LLMs cannot generate new information or perspectives outside of the training data isn't the case, as I'm sure you're aware, and synthesizing new or original conclusions from input is very much within their capabilities.
Yes, this has to happen within a context window and occurs on a token by token basis, but that seems like a somewhat arbitrary distinction. Humans are unquestionably better at memory access and running multiple subprocesses on information than an LLM.
But if anything, this simply suggests that continuing to move in the direction of multiple pass processing of NLP tasks with selective contexts and a variety of fine tuned specializations of intermediate processing is where practical short term gains might lie.
As for the issue of new domains outside of training data, I'm somewhat surprised by your perspective. Hasn't one of the big research trends over the past twelve months been that in context learning has proven more capable than was previously expected? I'd agree that a zero shot evaluation of a problem type that isn't represented in a LLMs training data is setting it up for failure, but the capacity to extend in context examples outside of training data has proven relatively more successful, no?