Hacker News new | ask | show | jobs
by aithrowawaycomm 624 days ago
> It seems like the only way you could systematic chart the weaknesses of an LLM is by having a class of problems that get harder for LLMs at a steep rate

That would be any problem more complicated than O(n) complexity, even with chain-of-thought prompting[1].

Note that the O(n) thing can bite you in all sorts of unintuitive ways: if the LLM+CoT can perform an O(n) Task A and O(m) Task B, then it can't do the O(nm) task "for every step of A, perform B on the result" unless you come up with a task-specific prompt outlining the solution. The alternative is to play RLHF Whack-A-Mole, separately training the LLM on the combined task. (I think this weakness might be why LLMs are hitting a wall in enterprise deployment, and also explains why LLM agents don't actually work.) The only way this will get fixed is with a fundamentally more sophisticated architecture.

[1] https://www.quantamagazine.org/how-chain-of-thought-reasonin...