|
|
|
|
|
by joe_the_user
624 days ago
|
|
Deducing things from the inability of an LLM to answer a specific question seemed doomed by the "it will be able to on the next itteration" principle. It seems like the only way you could systematic chart the weaknesses of an LLM is by having a class of problems that get harder for LLMs at a steep rate, so a small increase in problem complexity requires a significant increase in LLM power. |
|
That would be any problem more complicated than O(n) complexity, even with chain-of-thought prompting[1].
Note that the O(n) thing can bite you in all sorts of unintuitive ways: if the LLM+CoT can perform an O(n) Task A and O(m) Task B, then it can't do the O(nm) task "for every step of A, perform B on the result" unless you come up with a task-specific prompt outlining the solution. The alternative is to play RLHF Whack-A-Mole, separately training the LLM on the combined task. (I think this weakness might be why LLMs are hitting a wall in enterprise deployment, and also explains why LLM agents don't actually work.) The only way this will get fixed is with a fundamentally more sophisticated architecture.
[1] https://www.quantamagazine.org/how-chain-of-thought-reasonin...