|
|
|
|
|
by ForceBru
67 days ago
|
|
IMO "thinking" here means "computation", like running matrix multiplications. Another view could be: "thinking" means "producing tokens". This doesn't require any proof because it's literally what the models do. As I understand it, the claim is: more tokens = more computation = more "thinking" => answer probably better. |
|
Say that limit is X. This means if your problem fundamentally requires at least Y compute to be solved, your machine will never give you a reliable answer in less than ceil(Y/N) steps.
LLMs are like this - a loop is programmed to step the CPU/turn the crank until the machine emits a magic "stop" token. So in this sense, asking an LLM to be concise means reducing the number of compute it can perform, and if you insist on it too much, it may stop so early as to fundamentally have been unable to solve the problem in computational space allotted.
This perspective requires no assumptions about "thinking" or anything human-like happening inside - it follows just from time and energy being finite :).
--
[0] - I strongly think the industry is doing a huge disservice avoiding to anthropomorphize LLMs, as treating them as "little people on a chip" is the best high-level model we have for understanding their failure modes and role in larger computing systems - and instead, we just have tons of people wasting their collective efforts trying to fix "lethal trifecta" as if it was a software bug and not fundamental property of what makes LLM interesting. Already wrote more on it in this thread, so I'll stop here.